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IMAGE  UNDERSTANDING  AND  INFORMATION  EXTRACTION 


ABSTRACT  f 

This  report  summarizes  the  results  of  the  research  program  on  Image 
Understanding  and  Information  Extraction  at  Purdue  University.  The  report 
covers  the  period  1  April  1977  to  30  September  1977. 

The  objective  of  our  research  is  to  achieve  a  better  understanding  of 
image  structure  and  to  use  this  knowledge  to  develop  techniques  for  image 
analysis  and  processing  tasks,  especially  information  extraction.  Our  emphasis 
is  on  syntactic  decomposition  and  recognition  of  imagery  based  on  scene 
analysis.  It  is  our  expectation  that  this  research  will  form  a  basis  for  the 
development  of  technology  relevant  to  military  applications  of  machine  ex¬ 
traction  of  information  from  aircraft  and  satellite  imagery. 
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IMAGE  UNDERSTANDING  AND  INFORMATION  EXTRACTION 
RESEARCH  SUMMARY 

The  objective  of  our  research  Is  to  achieve  better  understanding  of  Image 
structure  and  to  Improve  the  capability  of  Image  processing  systems  to  extract 
information  from  imagery  and  to  convey  that  Information  In  a  useful  form.  The 
results  of  this  research  are  expected  to  provide  the  basis  for  technology  de¬ 
velopment  relevant  to  military  applications  of  machine  extraction  of  informa¬ 
tion  from  aircraft  and  satellite  imagery. 

A  block  diagram  of  an  Image  Understanding  System  is  shown  in  Fig.  I.  We 
first  consider  the  left  side  of  the  block  diagram.  After  the  sensor  collects 
the  image  data,  the  preprocessor  may  either  compress  it  for  storage  or  trans¬ 
mission  or  it  may  attempt  to  put  the  data  Into  a  form  more  suitable  for  analy¬ 
sis.  Image  segmentation  may  simply  Involve  locating  objects  in  the  Image  or, 
for  complex  scenes,  determination  of  characteristically  different  regions  may 
be  required.  Each  of  the  objects  or  regions  Is  categorized  by  the  classifier 
which  may  use  either  classical  decision-theoretic  methods  or  some  of  the  more 
recently  developed  syntactic  methods.  In  linguistic  terminology,  the  regions 
(objects)  are  primitives,  and  the  classifier  finds  attributes  for  these  primi¬ 
tives.  Finally,  the  structural  analyzer  attempts  to  determine  the  spatial, 
spectral,  and/or  temporal  relationships  among  the  classified  primitives.  The 
output  of  the  "Structure  Analysis"  block  will  be  a  description  (qualitative  as 
well  as  quantitative)  of  the  original  scene.  Notice  that  the  various  blocks 
in  the  system  are  highly  interactive.  Usually,  In  analyzing  a  scene  one  has 
to  go  back  and  forth  through  the  system  several  times. 

Past  research  In  Image  understanding  and  related  areas  at  both  Purdue  and 
elsewhere  has  indicated  that  scene  analysis  can  be  successful  only  if  we  re¬ 
strict  a  priori  the  class  of  scenes  we  are  analyzing.  This  is  reflected  In  the 
right  side  of  the  block  diagram  in  Fig.  1.  A  world  model  is  postulated  for  the 
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class  of  scenes  at  hand.  This  model  Is  then  used  to  guide  each  stage  of  the 
analyzing  system.  The  results  of  each  processing  stage  can  be  used  In  turn 
to  refine  the  world  model. 

Research  In  image  understanding  at  Purdue  concerns  with  all  aspects  of 
the  block  diagram  In  Fig.  1.  However,  the  emphasis  will  lie  in  the  inter¬ 
action  between  the  processing  stages  (left  side  of  Fig.  1),  in  the  searching 
for  suitable  world  models,  and  In  the  interplay  between  the  world  model  and 
the  processing  stages. 

Our  research  progress  during  the  past  six  months  is  presented  in  this 
report. 


Fig.  1  An  Image  Understanding  System 


IMAGE  SEGMENTATION  -  We  are  pursuing  several  approaches  to  image  segmentation, 
including  edge  detection,  region  growing,  and  clustering.  Some  recent  results 
on  edge  detection  is  reported  in  Saiahi  and  Huang.  The  method  consists  of  two 
steps.  In  the  first  step,  a  statistical  test  is  used  to  detect  edge  points. 
This  is  followed  by  a  second  step  where  the  edge  points  are  selected  and 
jointed  to  form  smooth  edges  by  a  tree  growing  and  pruning  procedure. 

IMAGE  ATTRIBUTES  -  The  method  of  describing  shapes  by  Fourier  boundary  de¬ 
scriptors  are  now  being  applied  to  the  recognition  of  three-dimensional  air¬ 
crafts.  Wallace  and  Wlntz  describes  some  very  encouraging  preliminary  results. 
IMAGE  STRUCTURE  -  We  feel  that  In  many  image  analysis  and  recognition  tasks, 
a  purely  syntactic  approach  is  not  adequate.  An  attempt  to  inject  semantic 
considerations  into  the  syntactic  approach  Is  presented  by  Tang  and  Huang. 

The  work  on  the  syntactic  description  of  shapes  continues;  some  results  of 
airplane  Identification  are  described  by  Fu  and  You. 

IMAGE  RECOGNITION  -  Our  work  in  this  area  concentrates  on  the  use  of  con¬ 
textual  Information  to  Increase  classification  accuracy.  Two  reports  are  pre¬ 
sented  (Kit  and  Swain,  and  Yu  and  Fu). 

PREPROCESSING  -  Many  image  processing  problems  call  for  two-dimensional  re¬ 
cursive  filters.  In  designing  these  filters,  the  test  of  stability  Is  a  key 
issue.  O’Connor  and  Huang  have  developed  several  very  efficient  algorithms  of 
testing  stabi 1 ity. 

APPLICATIONS  -  We  are  heavily  Involved  In  two  mission-oriented  projects.  The 
first  is  the  recognition  of  tactical  targets  in  FLIR  images  which  we  are  work¬ 
ing  on  In  collaboration  with  Honeywell  (Carlton  and  Mitchell).  The  second  is 
the  tracking  of  moving  targets  In  collaboration  with  the  U.S.  Army  White  Sand 
Missile  Range  (Mitchell). 


IMPLEMENTATION  -  As  we  get  Into  more  mlsslon-orlented  projects  where  real¬ 
time  processing  is  important,  we  begin  to  feel  the  inadequacy  of  general- 
purpose  serially-processing  computers.  Therefore,  we  are  studying  computer 
architectures  for  image  processing.  Some  considerations  are  presented  by 
Fu  and  Keng. 


EDGE  DETECTION  AND  SMOOTHING 
A.  Salahi  and  T.  S.  Huang 

INTRODUCTION 

In  [I]  a  test  for  edge  detection  was  described.  In  this  report  we  shall 
describe  another  algorithm  which  removes  cracks  and  parasitic  branches  and 
fills  missed  edges. 

DESCRIPTION  OF  SMOOTHING  ALGORITHM 

I  A  scanner  scans  the  picture  (according  to  some  processing  sequence)  to 
find  a  starting  point  for  an  "edge  follower"  then  stores  the  address  of  this 
point  in  R  and  pushes  this  address  into  some  stack;  say  stack  2  which  edge 
follower  will  use  to  pick  up  the  starttng  point  to  follow  the  edge. 

II  Stack  2  will  be  popped  up  to  give  the  root  of  a  tree;  then  a  tree  will  be 
induced  Into  scattered  edges  around  this  point  by  using  a  breadth  search  method 
(B.  S.  M.)  [2],  The  expand  part  of  B.S.M.  will  be  described  later.  This  tree 
will  be  expanded  through  depth  <_  limit;  then  the  tree  will  be  pruned  and  re¬ 
maining  leaves  of  tree  will  be  pushed  Into  stack  2  and  part  II  will  be  repeat¬ 
ed  until  stack  2  becomes  empty. 

III  Scanner  will  be  resumed  from  next  point  to  R;  If  It  finds  another  start¬ 
ing  point.  It  stores  that  point  Into  R  and  pushes  Its  address  Into  stack  2 
and  will  go  to  (ll);  If  It  exhausts  all  the  picture  points  then  it  will  stop. 
Pruning  a  tree 

To  prune  a  tree  (Fig.  1)  we  us«  a  very  simple  way: 

1.  Mark  the  leaves  with  a  maximum  depth  of  the  tree  with  "-" 

2.  For  each  node  with  change  Its  Into  "+"  and  put  a  Into  Its 
predecessor  node  (if  It  exists) 

3.  If  there  Is  no  node  with  "-"  In  the  tree,  then  remove  all  nodes  with¬ 


out  any  "+"  mark 


PROCESSING  SEQUENCE 


Because  edge  smoothing  through  edge  fc’ lowing  Is  a  sequential  operation, 
the  sequence  of  processing  of  different  points  (to  find  the  starting  point  for 
an  edge  follower)  has  some  effect  although  negligible  on  final  results.  By 
processing  sequence  we  mean  a  one  to  one  mapping  from  a  two  dimensional 
addressing  Into  a  linear  addressing.  Let  I  *=  { I.2.... IRW),  J  =  {1,2,...ICN} 

K  -  {  1,2,...IRW*ICN}  and  let  6  -  (IRW,  ICN,  IqJq)  where  IRW,  ICN  are  the 
number  of  rows  and  columns  in  digitized  picture  and  Oq»Jq)  be  the  starting 
point.  For  simplicity  let  OqJq)  “  (1,1).  then  processing  sequence  will  be 
♦0  :  IXJ  K 

Some  important  examples  are 


♦J 

•m 

(l, 

,J) 

-  (|- 

1)  ICN  +  j 

ve 

•» 

(!, 

J) 

»  (j- 

i 

+ 

""(I 

-1)  ICN 

1  +  J 

i 

is 

odd 

♦3 

=  < 

J 

ICN  - 

j+t 

i 

is 

even 

f  (j 

-1)  IRW  +  1 

J 

is 

odd 

-i 

j 

B 

IRW  - 

i+1 

j 

is 

even 

Subroutine  next  in  our  algorithm  uses  to  find  the  next  point  for  processing. 
Subroutine  expand  in  our  algorithm  generates  the  successor  nodes  of  each  node 
and  will  be  discussed  later. 

DATA  STRUCTURE 

Because  the  technical  description  of  our  algorithm  depends  on  data 
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Picture  Points 

A  three  bit  data  structure  Is  sufficient  to  represent  each  point  of 
picture 


d(i,j) 


d_  d,  dn 

2  l  0 


Fig.  2 


V'.J) 


il  If  there  Is  a  vertical  edge  between  (l,j)  and  (i,j+l) 
vp  otherwise 


r. 


Vu) 


l  if  there  is  a  horizontal  edge  between  (i,j)  and  (i+l,j) 
(0  otherwise 


d2(i,j)  - 


i  point  (i,j)  has  already  been  reached  by  edge  followers 
^0  otherwise 

One  stackandone  linear  array  namely  stack  2  and  A(lSS)  will  be  used  for  edge 
follower  and  array  A  stores  the  tree  In  a  layered  form.  Data  structure  for 
both  stack  2  and  array  A  are  the  same  (Fig.  3.)  After  the  tree  was  expanded  and 
stored  and  pruned  the  content  of  array  A  will  be  similar  to  a  new  program; 
namely  we  can  simulate  a  processor  which  accepts  Its  Instructions  from  array  A 
and  performs  picture  smoothing. 


predecessor 


$  IRC  Dir  Con 


12 


12 


12 


1  1 


Pig.  3 
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(a)  Con :  Partial  field  Con  Is  one  bit; it  shows  edge  approached  point 
(!,j)  through  leal  (Con  -  1)  or  imaginary  edge  (Con*0)  Fig.  4  . 


O.J) 
- + 


Con  “  0 

Fig.  i» 


O.J) 

- - * 


Con  **  1 


(b)  Pi r:  Edge  follower  can  move  in  four  directions  (Fig.  5).  Dir  partial 
field  is  two  bits  and  contains  the  direction  edge  follower  used  to 
move  to  point  (i ,j) . 


(,*J)  O.J) 

Dir  -  0  Dir  -  I  \ 


(I»J>  O.J)  I 

Dir  »  2  Dir  -  3 


Fig.  5 


(c)  IRC  I s  one  b i t 

1  means  this  edge  should  remain  in  final  image 


IRC 


V^O  means  this  edge  should  be  removed 

(d)  £  is  one  bit  and  it  has  been  used  as  a  separator  of  different  layers 
of  array  A;  ail  members  of  a  layer  have  the  same  depth  in  a  tree. 

(e)  (i,J)  is  the  address  of  a  point  in  a  digitized  picture. 

(f)  predecessor, points  to  predecessor  of  each  node  in  Array  A. 
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ALGORITHM  FOR  SMOOTHING 

Procedure  Smooth  (d,  IRW,  ICN,  Limit) 

Integer  Array  d [ 1 :  IRW,  1:  ICN],  A[ 1:200] ,  STACK  2  [1:200]; 

Integer  I,  J,  IR,  JR,  ICN,  IRW,  Limit,  Z ,  PTR,  PR2,  Depth,  ISWO; 

Switch  V:  -  LL,  LAI,  LL,  LA2,  LL,  LA3,  LL,  LA4,  LA5,  LL,  LA6,  LL,  LA7, 
LL,  LA8,  LL; 

Label  LI,  L2,  L3,  Ll»,  L5,  L6,  L7,  L8; 

Comment 

Algorithm  has  been  written  in  Semi-Algol  to  be  more  descriptive; 
Comment 

Array  d  is  picture,  A  Is  linear  array,  STACIf  2  is  stack; 

LI:  (IR.JR):  -  (1,1)  ; 

L2:  (l,J)  :  -  (IR,JR); 

L3:  (l,J)  :  *  Next  (l,J); 

JJF  (K,J)  >  (IRW,  ICN))  Then  Go  to  L8; 

Jf  (dQ(l,J)  v  djO.J))  a  d2(l,J)  -  1  Then 
begin 

(IR.JR):  -  (I , J) ;  d2(l,J):  -  1;  ISWO:  -  0; 

Jf,  d0(l.J)  -  1  Then  [0, 1, J, 0,0, 3,1]  -*•  STACK  2  else 

[0,I,J-1,  0,0, 3,0]  Stack  2 

end 

else  Go  To  L3  ; 

H:  depth:  «*  0;  PTR:  -  0;  Stack  2  -*•  A(l);  $  -*■  A(2) ;  Z:«  2; 

L5:  PTR:  -  PTR  +  1; 

If  A (PTR)  -  $  Then 


•e 


If  PTR  »  Z  Then 
begin 
Z:  -  Z-l; 

J_f  IS  WO  i*  0  Then 
begin 

IRH:  -  1;  Go  To  L6 
end 
else 
begin 

IRH:  -  0;  Go  To  L6 
end 


end 


begin 


else 

depth:  ■  depth  +  1; 

If  depth  ■  Limit  Then 

begin  IRH:  ■  1;  Go  To  L6  end  else 
begin  z :  ■  Z+l ;  A(Z)  $;  Go  To  L6 


end 

end 

else 


end 


expand  (A,  PTR,  Z ) ;  Go  To  L5  end  ; 
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L6:  for  IRVS:  -  Z  Step  -  1  until  1  do 


begin 


If  A(IRVS)  -  $  Then 


begin 


depth:  ■  depth  -  I ; 


IRH:  -  0 


begin 


J_f  IRH  -  1  Then 
begin 

If  depth  •  Limit  Then 

begin  Stack  2  «-  A(fRVS)  end; 

A(IRVS) :  -  A(IRVS).  OR.  8 
end; 

K1:  -  A(IRVS),  AND.  8; 

Lf  K1  i*  0  Then 
begin 

IZX:  -  predecessor  (A(PTR))  ; 

IZX  *  0  Then 

A(IZX) :  -  A(IZX) .  OR.  8 


end  ; 


K2:  -  A(lRVS).  AND  .  15; 
Go  To  V(K2+1) ; 
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LAI: 

d(i,j):  - 

d(l,j).  AND  , 

.  (-2); 

Go  To  LL; 

LA2: 

d(i+l,j): 

-  d(i+lj)  . 

AND  . 

(-1);  Go  To  LL; 

LA3: 

d(I,j+l): 

-  d(l  J+l)  . 

AND  . 

(-2);  Go  To  LL; 

LA4: 

d(i »j) :  - 

d(l  J)  .  AND 

.  (-0 

;  Go  To 

LL; 

LA5: 

d(I.J):  - 

d(l,j)  .  OR  , 

.  2;  Go 

To  LL; 

LA6 : 

d(i+l,J): 

-  d(l+!,J)  . 

OR  .  1 

;  Go  To 

LL; 

LA7: 

d  ( I ,j+l) : 

-  d(l ,J+1)  . 

OR  .  2 

:  Go  To 

LL; 

LA8: 

d(I,j):  - 

d(l,J)  .  OR  , 

.  1  i  Go 

To  LL; 

LL: 

L7:  \f_  Stack  2  Is  empty  Then  Go  To  L2  else 

begin 

ISWO:  -  1;  Go  To  L4 


expand  (See  Fig.  6) 

Objective:  This  algorithm  generates  the  successor  nodes  of  each  node  In 
Breath  Search  Method. 

Let  S  ■  (!,J)  be  a  point  In  which  edge  entered  with  direction^;  from  S  it  can 
go  at  most  to  2.  neighboring  points  A,  B,  C. 
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begin 


d2 (S) :  -  1; 

If  SA  exists  Then  begin 

ft 

\f_  d_  (A)  -  1  Then  remove  SA  else  [0,A,0,0,  <  x+3  >,!]-*■  Stack  2 


begin 


1 f  (A)  "  0  Then  begin 

If  AD  or  AE  exist  Then  [0,A,0,0,<  x+3  >,  0]  -*•  Stack  2 


end; 

If  SB  exl sts  Then  begin 

Jf_  d2  (B)  -  1  Then  remove  SB  else  [0,B,0,0,  <  x  >,  1]  Stack  2 


begin 


If  d.  (B)  ■  0  Then  begin 


If  BE  or  BF  or  BG  exist  Then  [0,B,0,0,x,0]  -*■  Stack  2 


If  SC  exists  Then  begin 


I f  dj  (C)  ■  1  Then  remove  SC  else  [0,C,0(0,  <  x+l  >,  1]  +  Stack  2 


begin 
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To  show  the  effect  of  algorithm  let  Fig.  7  be  the  result  of  edge  detection. 
Table  1  shows  Array  A  before  pruning.  Table  2  shows  Array  A  after  pruning. 
Fig.  8  shows  Fig.  7  after  smoothing.  Contents  of  A  can  be  translated  Into 
the  following  program. 


PC 

OP 

dir 

Add 

22. 

Nop 

21. 

Nop 

20. 

Fill 

0 

7,6 

19. 

Nop  ' 

18. 

Nop 

17. 

REMOVE 

2 

7,3 

16, 

Nop 

15. 

Nop 

14. 

Remove 

3 

6.3 

13. 

Nop 

12. 

Fill 

3 

6.4 

11. 

Nop 

10. 

Nop 

9. 

Remove 

0 

4,5 

8. 

Nop 

7. 

Nop 

6. 

fill 

3 

4,4 

5. 

Nop 

4. 

Nop 

3. 

Remove 

2 

3,2 

2. 

Nop 

1. 

Nop 

RESULTS  (Figs.  9  and  10) 


Let  us  assume  Algorithm  1  to  be  the  algorithm  which  we  used  for  statis¬ 
tical  edge  detection  [I]  and  Algorithm  2  the  present  algorithm  for  smoothing. 

Let  P{  |t|  <_  tr>  *  1-a  and  _d  be  the  maximum  depth  of  tree  which  edge 
follower  uses.  Then 

Part  (a)  of  each  figure  is  the  result  of  Algorithm  1  under  Limit  *  t 
(b)  shows  (a)  after  using  algorithm  2 


REFERENCE 

[1]  A.  Salahi  and  K.  S.  Fu,  "A  Decision-Theoretic  Approach  to  Edge  Detection, 
Purdue  Unlv.  Tech.  Rep.,  TR-EE  77“16,  March  1977. 

[2]  Nils  J.  Nilsson,  Problem  Solving  Method  in  Artificial  Intelligence. 

McGraw-Hill,  N.Y.  1971.  “ — - - * - 


SHAPE  INFORMATION  EXTRACTION 
T.  Wallace  and  P.  A.  Wlntz 

INTRODUCTION 

In  our  last  final  report  [1],  a  practical  shape  Information  extraction 
algorithm  was  presented  which  detected  the  presence  of  an  airplane  in  an 
actual  photograph.  The  classification  was  made  using  Fourier  descriptors, 
which  were  computed  from  the  boundary  of  the  plane.  The  boundary  was  in  turn 
located  by  the  BLOB  region  detection  algorithm.  In  our  last  quarterly  report 
[2],  we  presented  several  approaches  to  improving  and  extending  this  algorithm. 
This  report  describes  certain  theoretical  results  of  interest  in  connection 
with  three  dimensional  shape  recognition  using  Fourier  Descriptors.  In 
addition,  preliminary  experimental  results  are  presented. 

FOURIER  DESCRIPTORS 

The  Three-dimensional  Problem 

The  Fourier  descriptor  (FD)  algorithm  Is  a  very  powerful  technique  for 
classifying  shapes,  but  work  to  date  has  concentrated  or  two  elwensional 
recognition  problem.  Certain  properties  of  FDs  facilitate  their  use  In  three 
dimensional  object  detection,  providing  both  an  Identification  of  the  unknown 
object  and  an  estimate  of  Its  orientation  relative  to  the  camera.  Specifi¬ 
cally,  It  has  been  shown  [3]  that  averaging  the  FDs  of  two  different  shapes 
(frequency  domain)  yields  a  FD  which  will  Inverse  transform  to  a  shape  which 
appears  to  be  an  "average"  contour,  intermediate  In  shape  between  the  two 
original  contours.  The  data  base  which  must  be  stored  to  represent  a  three- 
dimensional  object  may  be  reduced  by  using  fewer  projections,  and  "inter¬ 
polating"  between  them  in  the  frequency  domain.  This  approach  also  enables 
a  more  accurate  estimation  of  the  actual  orientation  In  space  of  the  object 
relative  to  the  camera.  Previous  work  on  this  estimation  problem  [4]  '■as 
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assumed  that  the  orientation  of  the  unknown  object  is  that  of  the  nearest 
reference  projection.  This  evidently  limits  estimation  accuracy  to  the 
resolution  of  the  reference  projection  set. 

Interpolation  and  Sampling  Error 

One  theoretical  problem  concerns  the  sample  spacing  used  in  sampling  a 
time  domain  contour.  As  explained  in  our  last  final  report,  a  uniform 
sampling  strategy  Is  employed  In  the  present  algorithm.  In  order  to  facili¬ 
tate  analysis  of  a  wide  variety  of  shapes.  While  non-uniform  sampling  can 
result  In  faster  convergence  of  a  FD  [3l,  there  are  obvious  complications 
Involved  in  defining  such  a  sampling  strategy  for  general  shapes.  When 
operations  on  FDs  are  made  in  the  frequency  domain,  there  is  no  guarantee  that 
the  resulting  time  domain  representation  will  have  uniform  sampling.  Hence, 
even  if  an  unknown  contour  appeared  Identical  to  an  "average"  contour  com¬ 
puted  by  using  linear  conMnatlons  of  known  FDs,  different  sample  spacing 
could  result  in  some  difference  between  the  two  FDs. 

Consider  the  case  in  which  an  unknown  projection  lies  directly  between 
two  library  projections.  We  postulate  that  some  linear  combination  of  the 
library  projections  A ( 1 )  and  A (2)  will  give  a  good  approximation  to  the  un¬ 
known  projection  X: 

X  -  k(l )  •  A ( 1 )  +  k(2)  .  A (2 ) 
where  the  k(i)  are  scalers  with  k(l)  +  k{2)  ■  1.0 

and  the  A (I )  and  X  are  FDs.  Due  to  linearity,  we  can  say  that  this  frequency 
domain  interpolation  FD  X  has  a  transform  x  which  can  also  be  computed  by 
interpolating  between  space  domain  vectors  using  the  same  coefficients. 

x  -  k(l)  •  a ( I )  +  k(2)  •  a(2) 

where  the  a ( I )  are  the  IDFTs  of  the  A(I).  It  Is  now  obvious  that  the  point 
spacing  of  x  need  not  be  uniform,  and  it  is  also  easy  to  compute  the  point 
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density  error  for  some  test  Interpolations.  We  merely  compute  the  vector  x 
In  the  space  domain  using  (2),  then  resample  that  vector  uniformly  to  get 
some  x',  and  finally  compute  the  m.s.  distance  between  x  and  x“.  Experiments 
with  two  shapes  whose  normalized  FDs  (NFDs)  had  a  distance  of  about  .3  (a 
distance  less  than  .1  is  generally  used  as  a  classification  threshold)  show** 
ed  a  point  density  error  190  to  200  times  less  than  the  distance  between  the 
original  NFDs,  with  k(l)  ■  k (2)  ■  .5.  The  number  of  points  used  in  the  space 
domain  vectors  had  a  slight  effect  on  the  error,  with  more  densely  sampled 
vectors  producing  slightly  less  error,  it  can  be  concluded  that  this  problem 
should  not  have  a  noticable  effect  on  the  algorithm,  since  In  practice,  ad¬ 
jacent  projections  can  be  expected  to  have  a  NFD  distance  less  than  .3.  This 
fact  should  further  minimize  the  point  density  error. 

The  Estimation  Problem 

In  the  example  discussed  above,  a  projection  was  assumed  to  He  directly 
between  two  library  projections,  and  the  experiment  was  performed  accordingly. 
In  general,  however,  any  random  projection  will  not  lie  on  the  grid  defined 
by  a  library  of  projections.  Hence  more  than  two  library  projections  must  be 
used  to  perform  the  estimation.  If  a  rectangular  grid  of  projections  Is  used, 
it  would  seem  reasonable  to  do  the  estimation  based  on  four  library  pro¬ 
jections,  but  consider  Instead  the  general  case  of  estimating  an  n-vector 

X (k)  as  a  linear  combination  of  m  n-vectors  (Y j (k) ,  1  <  I  <  m: 
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where 


D,(k)  -  Y  j  (k)  -  Ym(k) 

The  total  m.s.  error  Is  given  by: 
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k-1 

«  E  {  X2(k)  -  2  X(k)  [  E1  a.  D. (k)  +  Y  (k)]  + 
k-1  I-I  1  '  m 


m-1  « 

[E  a.  D.(k)  +  Y  (k)r> 
1—1  m 


Differentiating  with  respect 

to 

the  < 

a,. 

setting 

each 

resulting  equation 

to  zero,  and  rearranging: 

n 

m-1 

n 

2  .  v 

E  2[X(k)-Y  (k)] 
k-1  m 

DI 

(k)  - 

E 

J-l 

E  a 
k»!  J 

D,(k) 

Dj.(k) 

+  a{Dj(k) 

n 

m-1 

n 

O 

E  2[X(k)-Y  (k) ] 
k-1  m 

°i 

(k)  - 

E 

J-l 

a .  [  l 

J  k-1 

D,(k) 

Dj(k) 

+  a | D j (k) ] 

For  1  <  I  <  m-1 . 

This  expression  Is  a  set  of  m-1  equations  In  m-1  unknowns  which  can  be  solved 
for  the  unknowns  a.,  a  is  then  computed  from 


Data  Reduction 

The  time  required  to  execute  the  above  estimation  algorithm  Is  dependent 
on  the  dimension  (n)  of  the  vectors.  It  Is  thus  desirable  to  reduce  the  di¬ 
mension  of  the  vectors  as  far  as  possible  without  degrading  the  classifica¬ 
tion  performance.  Equally  important  Is  the  problem  of  storage  of  library 
data  representing  a  three-dimensional  object.  Previous  FD  classification 
results  have  indicated  that  there  Is  no  advantage  In  using  more  than  30 
(complex)  coefficients.  In  fact,  quite  good  results  have  been  obtained  with 
only  ll»,  although  there  was  a  slight  degradation  In  performance  when  compared 

with  30. 

The  obvious  approach  Is  to  estimate  the  autocorrelation  matrix  or  co- 
variance  matrix  of  the  data,  and  find  the  eigenvalues  and  eigenvectors  which 
provide  optimal  data  compression.  There  can  be  some  difficulty  In  computing 
eigenvalues  and  eigenvectors  of  a  60  by  60  matrix.  (Our  feature  vector  con¬ 
sists  of  30  complex  coefficients.)  One  way  to  reduce  the  size  of  this  matrix 
Is  to  convert  the  data  to  30  real  coefficients.  There  are  two  ways  that  this 
might  be  done.  The  most  obvious  way  is  to  simply  take  the  magnitudes  of  the 
FD  coefficients,  since  classifications  based  on  magnitude  information  alone 
have  been  shown  to  be  quite  effective.  However,  if  the  data  has  bilateral 
symmetry,  the  associated  NFDs  should  automatically  be  real.  Even  if  the 
data  does  not  have  this  symmetry,  the  normalization  procedure  tends  to  mini¬ 
mize  the  magnitudes  of  the  imaginary  parts  of  the  NFD,  and  correspondingly 
minimize  their  contribution  to  the  classification.  Hence  virtually  all  the 
information  can  be  preserved  by  simply  taking  the  real  part  of  each  NFD  co¬ 
efficient.  This  is  the  approach  that  was  used  In  the  experiment  described 
below.  The  eigenvalue/eigenvector  computation  was  performed  using  just  the 
real  parts  of  the  data,  but  the  transformation  matrix  thus  obtained  was 


applied  to  both  the  real  and  Imaginary  parts  of  the  data,  yielding  a  complex 
vector  in  the  transformed  space. 

Using  the  covariance  matrix  approach  results  In  maximum  compression  of 
the  data,  but  the  time  to  compute  the  transformed  vectors  is  slightly  in¬ 
creased  due  to  the  necessity  to  subtract  the  mean  vector.  In  cases  where  ex¬ 
treme  data  reduction  is  required,  this  might  be  the  best  approach,  but  since 
the  present  algorithm  requires  keeping  virtually  all  the  information,  the 
advantages  should  be  slight.  In  other  words,  the  Information  contained  in 
five  to  ten  coefficients  of  the  transformed  space  should  be  essentially  the 
same  regardless  of  which  method  is  employed.  On  the  other  hand  the  auto¬ 
correlation  matrix  method  is  slightly  simpler  and  lends  Itself  to  a  more 
intuitive  interpretation. 

The  basic  FD  classification  procedure  can  be  viewed  as  reducing  a  shape 
to  circular  or  elliptical  basis  shapes  of  various  frequencies.  The  trans¬ 
formed  FD  classification  procedure  makes  use  of  a  more  efficient  set  of  basis 
shapes.  Recall  that  the  normalization  procedure  for  FDs  results  in  a  dc 
component  defined  to  be  zero  (a(0)  ■  0) ,  and  a  fundamental  frequency  com¬ 
ponent  defined  to  be  unity  (a(l)  ■  1.0).  Since  these  coefficients  would  make 
the  autocorrelation  or  covariance  matrix  singular,  they  are  not  included  in 
the  original  data  vectors.  They  of  course  contribute  nothing  to  the  classi¬ 
fication  process.  In  order  to  see  what  sort  of  basis  shapes  we  are  classify¬ 
ing  with,  it  is  only  necessary  to  inverse  transform  each  transformed  co¬ 
ordinate  In  turn,  and  then  insert  a  fundamental  frequency  of  some  appropriate 
magnitude.  We  obtain  the  FD  of  some  shape,  and  performing  an  inverse  FFT 
shows  us  what  it  looks  like. 

The  experiment  originally  performed  to  test  the  FD  algorithm  was  repeated 
to  test  the  data  reduction  procedure.  Recall  that  20  low  resolution  airplane 


silhouettes  after  the  FDs  were  calculated  and  normalized.  Using  a  mean 
square  distance  criterion  and  30  coefficients,  the  classification  accuracy 
was  35%.  After  computing  eigenvectors  and  eigenvalues  of  the  autocorrelation 
matrix  of  the  NFDs,  and  making  the  appropriate  linear  transformation,  a  classi¬ 
fication  using  four  transformed  coefficients  achieved  the  same  figure  of  35% 
classification  accuracy.  Figures  1-4  show  the  basis  shapes  associated  with 
these  four  coefficients. 

THE  3-D  ALGORITHM 

An  experiment  was  performed  in  which  unknown  aircraft  outlines  were 

t 

identified  and  their  orientation  in  space  estimated  using  the  above  results. 
First,  a  set  of  aircraft  was  synthesized  using  a  graphics  approach.  Three- 
dimensional  approximations  were  constructed  for  six  different  aircraft,  a 
Mirage,  a  Mig,  a  Phantom,  an  F104,  an  F105»  and  a  B57.  Figure  5  shows  rep¬ 
resentative  images  generated  by  this  program.  These  three  dimensional  images 
were  then  rotated  through  appropriate  angles  to  create  a  library  of  pro¬ 
jections.  The  program  was  first  given  the  library,  and  then  given  randomly 
selected  orientations  to  identify. 

The  experiment  of  Dudani  et  al  [4]  was  very  similar,  but  several  impor¬ 
tant  differences  should  be  noted.  First,  the  data  used  by  Dudani  was  con¬ 
structed  using  model  aircraft  and  a  television  camera  hookup.  It  might 
appear  that  this  Is  a  more  realistic  approach,  as  well  as  a  more  demanding 
experiment  than  one  using  graphically  generated  data.  However  there  are  two 
problems  with  this  method  which  the  graphical  method  avoids.  First,  the 
resolution  of  the  mechanical  mount  used  by  Dudani  was  5  degrees.  This  rep¬ 
resents  an  error  in  data  generation  which  is  avoided  by  the  more  exact 
graphics  approach.  Second,  since  the  camera  is  a  finite  distance  from  the 
model,  parallax  problems  affect  the  images,  making  the  camera  image  different 
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from  the  Image  received  from  a  long  distance.  Since  most  practical  photo¬ 
graphs  of  actual  aircraft  in  flight  would  be  taken  at  a  large  distance,  this 
error  is  undesirable. 

In  addition  to  the  above  considerations,  it  would  probably  be  easier  to 
use  graphics  techniques  in  a  practical  system,  since  accurate  graphical  rep¬ 
resentations  constructed  from  blueprints  would  probably  generate  library  data 
faster  and  more  accurately  than  model-tv  camera  setups. 

A  major  advantage  of  Dudani's  approach  is  the  accuracy  of  the  aircraft 
shape.  Our  graphics  program  approximates  each  plane  by  using  about  50-100 
(geometric)  planes.  A  more  elaborate  program  could  generate  an  arbitrarily 
accurate  representation  of  each  aircraft,  with  corresponding  increase  in 
computation  time.  The  present  data  gives  a  reasonably  good  approximation  to 
each  aircraft,  although  the  small  detail  is  lacking.  The  effect  on  the 
classifications  is  probably  to  Increase  their  difficulty,  since  certain  minor 
features  are  missing.  On  the  other  hand,  the  data  can  probably  be  represented 
by  fewer  projections,  due  to  the  reduced  complexity. 

The  basic  problem  considered  by  Cudani  was  classifying  unknown  aircraft 
Images  oriented  at  5  degree  intervals  in  a  140  degree  by  90  degree  sector. 

Each  aircraft  was  represented  by  a  library  of  moment  feature  vectors  computed 
from  551  projections  within  this  sector.  The  classification  was  performed  by 
computing  distances  from  a  moment  feature  vector  of  the  unknown  image  to  the 
moment  feature  vectors  of  the  library  images,  and  then  classifying  using  a 
distance-weighted  k-nearest  neighbor  rule.  Note  that  the  images  used  by 
Dudani  did  not  contain  any  mirror  image  pairs,  and  hence  are  obviously  not 
all  the  images  which  can  be  theoretically  recognized.  In  fact,  a  little  re¬ 
flection  will  convince  one  that  if  an  object  has  enough  assymetry,  it  can  be 
recognized  at  any  angle  at  all,  and  that  angle  identified.  In  the  case  of 


aircraft,  there  is  generally  bilateral  symmetry,  but  this  does  not  necessarily 
greatly  limit  the  set  of  angles  which  can  be  theoretically  recognized. 

Our  algorithm  recognizes  aircraft  outlines  taken  from  a  sector  of  180  by 
180  degrees,  i.e.  a  hemisphere.  Note  also  that  if  the  angles  near  the  front 
view  and  rear  view  of  the  aircraft  are  deleted,  the  problem  is  much  easier, 
since  the  shapes  vary  much  more  radically  when  large  surfaces  are  viewed 
almost  edgewise.  Dudani's  consideration  of  only  140  degrees  reduces  this 
problem.  Our  algorithm  also  recognizes  random  projections.  There  is  no 
quantization  of  random  projections  corresponding  to  Dudani's  5  degree  incre¬ 
ments.  Finally,  the  first  version  of  our  algorithm  uses  only  99  projections 
to  represent  an  aircraft  over  the  hemisphere,  which  represents  a  density  of 
projections  14.3  times  less  than  that  used  by  Dudani. 

The  actual  classification  program  proceeds  as  follows.  First,  the 
library  of  projections  is  computed,  and  the  NFD  of  each  projection  is  computed. 
The  autocorrelation  matrix  of  the  NFDs  is  computed,  and  an  eigenvalue-eigen¬ 
vector  transformation  reduces  the  data  dimensionality  from  30  complex  numbers 
to  5  complex  numbers.  The  real  parts  of  the  complex  numbers  are  used  to  com¬ 
pute  the  autocorrelation  matrix,  but  the  complex  parts  of  the  transformed  co¬ 
efficients  are  kept  to  assist  In  the  classification. 

Next  the  m.s.  distance  from  a  given  unknown  contour  to  each  library 
vector  is  computed.  The  distance  to  the  nearest  library  contour  is  saved  as 
the  current  best  estimate  of  the  minimum  m.s.  distance  achievable.  The  pro¬ 
jections  adjacent  to  the  nearest  library  projection  are  investigated  by  the 
m.s.  estimation  algorithm  described  above  in  an  attempt  to  interpolate  be¬ 
tween  the  library  projections. 

The  interpretation  of  the  estimation  coefficients  returned  by  the  esti¬ 
mation  subroutine  is  somewhat  heuristic,  and  goes  something  like  this. 
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This  routine  Is  constrained  to  return  four  numbers  whose  sum  Is  1.0.  It 
often  happens  that  two  of  the  vectors  being  used  to  estimate  the  unknown 
contour  are  not  very  similar  to  the  unknown  contour,  but  are  quite  similar 
to  each  other.  In  this  case,  the  estimation  coefficients  are  of  similar 
magnitudes  and  opposite  sign,  such  as  2.0  and  -2.05.  What  the  estimation 
algorithm  is  doing  is  using  the  difference  vector  to  help  generate  the  opti- 
mum  m.s.  est image  of  the  unknown  vector.  We  of  course  do  not  want  to  allow 
this  kind  of  estimation,  since  it  is  inconsistent  with  our  theory  of  inter¬ 
polation  of  FDs.  Another  thing  which  is  commonly  observed  when  the  unknown 
vector  differs  from  the  library  vectors  being  used  to  estimate  it  is  a  set 
of  large  positive  and  negative  estimation  coefficients  being  returned.  This 
again  just  tells  us  that  we  cannot  expect  to  find  a  reasonable  interpolated 
FD  In  the  sector  determined  by  that  set  of  library  projections. 

The  heuristic  solution  to  these  effects  Is  as  follows.  First,  we  quit 
looking  In  a  particular  sector  if  the  estimation  coefficients  returned  are 
too  large  in  magnitude.  The  algorithm  is  not  very  sensitive  to  this  magnitude 
and  1.5  to  2.0  is  usually  used.  Also,  if  two  coefficients  sum  to  a  small 
number  (.1),  but  have  relatively  large  magnitudes  (>.5)  they  are  assumed  to 
be  cancelling  coefficients,  and  are  deleted  from  the  estimation  set.  The 
estimation  process  is  then  repeated  with  the  remaining  two  vectors  being  used 
to  estimate  the  unknown  set.  Similarly,  any  negative  coefficients  are  de¬ 
leted  from  an  estimation,  and  the  remaining  two  or  three  are  used  to  repeat 
the  estimation  process.  When  an  estimat"  n  of  the  unknown  vector  in  terms 
of  two,  three,  or  four  adjacent  library  projection  vectors  is  achieved  in 
which  all  the  coefficients  lie  between  zero  and  one,  the  distance  is  compar¬ 
ed  to  the  minimum  distance  achieved  so  far.  If  the  new  distance  is  less,  the 
minimum  distance  Is  updated. 


This  process  may  be  repeated  for  the  k  nearest  library  projections, 
where  k  is  optional,  and  is  generally  in  the  range  of  4-10.  If  the  distances 
to  the  nearest  k  projections  are  approximately  equal,  the  full  k  projections 
will  be  investigated.  However,  projections  whose  distances  are  more  than  1.5 
to  2.0  times  greater  than  the  minimum  distance  are  not  investigated.  Each 
library  projection  has  one,  two,  or  four  sectors  surrounding  it  which  must  be 
investigated  by  the  estimation  subroutine.  (If  the  sector  is  in  the  middle 
of  the  library  set,  there  are  four,  if  it  is  on  the  border,  there  are  two, 
and  if  it  is  in  a  corner,  there  is  one.)  After  the  desired  number  of  poss¬ 
ible  sectors  are  investigated,  there  are  two  possible  procedures.  The  esti¬ 
mated  orientation  is  taken  to  be  that  of  the  original  nearest  library  vector, 
if  the  estimation  fails  to  improve  on  this  distance.  If  the  estimation  pro¬ 
cedure  Is  successful,  the  orientation  is  computed  by  multiplying  the  orienta¬ 
tions  of  the  vectors  used  in  the  estimation  by  their  appropriate  coefficients. 

Results  to  date  using  6  aircraft,  and  classifying  50  unknown  images  for 
each  one  show  a  classification  accuracy  of  80$;  overall.  The  classification 
accuracy  is  about  12%  if  the  estimation  routine  is  not  used. 

It  is  expected  that  this  figure  can  be  improved  by  making  use  of  the 
real  and  imaginary  parts  of  the  data  when  computing  the  eigenvector  trans¬ 
formation  matrix.  Also,  the  spacing  of  library  projections  used  in  this  ex¬ 
periment  Is  non-uniform  in  an  attempt  to  increase  the  Fourier  space  distance 
uniformity  of  the  projection  set,  but  it  is  not  optimum.  Finally,  it  may  be 
necessary  to  Increase  the  density  of  projections  somewhat  to  bring  the  classi¬ 
fication  accuracy  up  to  that  achieved  by  Dudani.  The  current  attempt  to  get 
by  with  a  projection  density  14.3  times  lower  than  Dudani  may  be  too  optimistic 

Given  a  chain  code  representation  of  the  outline  of  an  aircraft  pro¬ 
jection,  the  normalized  FD  is  computed  in  about  2.76  sec.  This  time  includes 


an  FFT  which  Is  of  length  512  or  1024,  depending  on  the  length  of  the  parti¬ 
cular  contour.  This  normalized  FD  is  then  classified  and  its  orientation 
estimated  in  about  2.38  sec.  These  times  are  for  a  11A5  with  floating 
point  hardware.  The  program  itself  is  written  in  Fortran  and  is  a  research 
tool  rather  than  a  highly  efficient  implementation  of  the  algorithm. 

SHAPE  DETECTION  ALGORITHMS 
The  Blob  Algorithm 

The  BLOB  program  for  segmenting  pictures  has  been  used  successfully  in 
classification  of  mul ti spectral  data  and  more  recently  in  locating  aircraft 
in  ordinary  photographic  data.  The  regions  located  by  the  BLOB  program  are 
similar  in  both  mean  and  variance.  In  order  to  improve  BLOB  performance  using 
single  channel  data,  we  have  been  reexamining  the  pixel  group  statistics  used 
by  BLOB  with  regard  to  their  applicability  to  our  data. 

Mean  information  is  obviously  very  important,  and  correlates  well  with 
distinct  regions  as  defined  by  human  observers.  The  use  of  variance  in¬ 
formation  is  not  as  obvious  a  procedure,  and  we  have  recently  been  examining 
photographic  data  with  a  program  which  replaces  each  pixel  group  by  its 
standard  deviation,  simulating  with  BLOB  "sees"  when  it  does  its  variance 
test.  There  are  two  classes  of  pictorial  data  which  are  evident  when  var¬ 
iance  information  is  examined — those  which  were  created  by  imaging  processes 
with  sufficient  dynamic  range  to  represent  the  highest  gray  levels,  and  those 
which  "clip"  at  high  gray  levels. 

In  the  latter  case,  It  is  obvious  that  variance  information  is  very  much 
a  function  of  the  fraction  of  the  image  which  was  too  bright  for  the  imaging 
system.  In  the  former  case,  the  surprising  result  has  been  obtained  that, 
with  our  aerial  photographs,  the  variance  information  is  largely  a  duplica¬ 
tion  of  the  mean  information.  One  might  expect  that  some  regions  would  have 
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high  brightness  and  low  variance,  and  some  regions  low  brightness  and  high 

l 

variance,  but  this  is  not  what  generally  occurs.  In  fact,  low  brightness 
regions  almost  invariably  have  low  variance,  and  ,high  brightness  regions 
which  may  appear  completely  uniform,  as  a  painted  airplane  body,  have  rela¬ 
tively  high  variance. 

This  situation  may  be  partially  explained  by  the  logarithmic  character¬ 
istic  of  the  human  eye.  At  low  gray  levels,  variance  information  is  easily 
seen  as  the  eye  can  discriminate  between  closely  spaced  shades  quite  well  here. 
At  high  gray  levels,  variance  information  Is  obscured  by  the  insensitivity  of 
the  eye  to  gray  level  variations  of  even  tens  of  gray  levels.  (Our  pictures 
are  displayed  with  256  gray  levels.) 

After  taking  the  logarithm  of  a  picture,  the  quality  often  does  not 
appear  to  be  impaired  to  the  human  observer,  but  the  standard  deviations  of 
pixel  groups  appear  to  be  almost  random,  with  a  few  slightly  higher  variance 
regions  which  still  correspond  to  the  higher  mean  regions.  Taking  the  loga¬ 
rithm  again  often  does  not  result  in  serious  degradation  of  the  picture,  but 
the  variance  information  appears  to  be  random  noise! 

We  have  not  discovered  a  function  of  the  gray  levels  of  two  by  two  pixel 
groups  which  contains  useful  information  for  use  in  segmenting  the  picture, 
and  which  Is  not  largely  a  repetition  of  the  mean  information.  One  possibil¬ 
ity  is  to  retain  the  geometry  (contour-tracing  algorithm)  of  the  BLOB  program, 
and  remove  the  variance  test  from  the  program.  Using  this  version  of  BLOB 
perhaps  interactively  with  the  growing  and  shrinking  algorithms  described  in 
our  last  quarterly  report  may  provide  the  best  overall  performance  yet. 
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A  SEMANTIC-SYNTACTIC  APPROACH  TO  IMAGE  UNDERSTANDING 


G.  Y.  Tang  and  T.  S.  Huang 


INTRODUCTION 


We  propose  the  Injection  of  semantic  features  Into  a  context  free  gram¬ 
mar  for  the  purpose  of  understanding  an  input  signal. 

'  f 

A  feature  vector  Is  assigned  to  each  terminal  and  to  each  non-terminal. 

A  feature  transfer  function  is  attached  to  each  production  rule.  The 
feature  transfer  function  transfers  features  at  the  right  hand  side  of  the 
production  rule  to  the  left  hand  side  non-terminal.  After  parsing  the  feature 
vector  associated  with  the  root  of  the  derivation  tree  is  sent  to  a  discrimin¬ 
ating  function  to  determine  the  semantic  well-formedness  of  the  sentence. 

The  acceptance  of  an  input  signal  is  thus  based  on  not  only  its  syntactic 
structure  but  also  its  semantic  meaning. 

This  approach  is  applicable  to  many  problems  in  image  understanding.  In 
this  report  we  shall  present  only  a  simple  one-dimensional  example. 

AN  FXAMPLE 


We  use  this  approach  to  find  the  width  of  a  highway  or  the  location  of  an 
edge  in  aerial  photos.  The  grey  level  distribution  along  a  straight  line  seg¬ 
ment  crossing  the  highway  or  edge  is  obtained  by  a  film  scanner. 

The  apriori  knowledge  about  the  signal  is  that  it  looks  like  one  of  the 
four  paradigms  a,  B,  y,  o  in  Fig.  1.  a,  3  are  paradigms  for  ideal  edges,  y,  a 

are  the  paradigms  for  highways. 

The  grammar  describing  the  ideal  paradigms  is: 
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D  are  terminals. 


The  transformation,  which  brings  the  idea)  paradigms  to  the  realistic 

A 

level,  is  to  replace  each  occurrence  of  the  symbol  0  by  a  sentence  generated 
by  the  grammar: 
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D  is  the  start  symbol,  and  c,  d,  f  are  terminals. 
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The  transfer  functions  associated  with  the  production  rules  are  defined  as: 
FI  ;  A  BCD 


W  (A)  -  C  (D)  -  C  (C) 
C  (A)  -  C  (C) 


A  (A)  -  |  A  (6)  -  A  (C)  | 

Rl  (A)_  *  Max  (Max  (Rl(£)  ,  R1(0))  ,  A  (B)/  W(B)) 

F2  :  A  B  C 

A  ( A)  =  A  (B)  +  A  (C) 

W  (A)  -  W  (£)  +  W  (£) 

C  (A)  -£<»>  if.  1  <£>  "  0  . 

“  4c  (B)  +  C  (£))  /2  if  C  (£)  ¥  0 

Rl  (A)  -  Rl  (j») 

R2  (A)  -  R2  (B) 

F3  :  A  ->  B 

A  (A)  -  A  (B) 

W  (A)  -  W  (B) 

Rl  (A)  »  A  (B)/V? r  (B) 

R2  (A)  -  W  (B) 

C  (A)  -0 

F4  :  A  -*■  B  C 


A  (A)  -  Max  (A  (B),  A  (C)) 
W  (A)  -  W  (B)  +  W  (C) 

C  (A)  -  0 

F5  :  A  -*•  B 

•  —  “ 

C  (A)  -  0 

A  (A)  -  A  (B) 

W  (A)  -  W  (B) 


A  (A)  -  A  (C) 

W  (A)  -  W  (C) 

R1  (A)  -  R1  (C) 

,0  If  W  (B)  >,t 
R2  (A)  -  I  . 

^  R2  (C),  otherwise 

C  (A)  =  C  (C) 

F7  :  A  -*-  ££ 

R1  (A)  »  Max  (  R1  ( C) ,  A  (B_)/W  (B)) 
C  (A)  -  C  (C) 

W  (A)  -  W  (£) 

/I  If  R2  (C)  4  0 
R2  ( A )  -  < 

1  0  otherwise 
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-  1. 

The  underscored  symbols  like  A  are  formal  parameters.  Symbols  with  ~ 

atop  designate  features  associated  with  the  formal  parameters  In  parentheses, 

—  •* 

e.g.  C  (A)  is  the  C  feature  attached  to  A. 

There  are  five  terminals  a,  b,  c,  d,  f.  To  each  terminal,  there  are 

three  features  attached.  Literally  a,  b,  c,  d,  and  f  are  five  tendencies  in 

the  input  signal,  a  and  c  represent- the  tendency  of  going-up.  b  and  d  are 

for  going -down,  f  is  for  flatness.  The  extent  of  going-up  differentiates  a 

from  c.  a  stands  for  long  going-up.  c  stands  for  short  going-up.  Similarlly 

b  Is  long  going-down  and  d  is  short  going-down. 
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The  three  features  attached  to  the  terminals  are  A,  W,  and  C.  C  refers 

•* 

to  the  center  of  the  tendency.  W  refers  to  the  width  of  the  tendency.  A  is 

«•*  urn 

a  measure  of  the  opposition  (long/short).  A  (a)  ■  1  or  A  (b)  =  1  means  abso- 

«• 

iutely  long.  A  (c)  ■  I  or  A  (d)  ■  1  means  absolutely  short.  More  specifically, 
let  *.(•)  denote  the  height  of  the  tendency.  Then  for  a,  b,  we  have 
A  (S)  =  (£(S)/M1  -  t)/(l-t). 

For  c,  d,  we  have 

A  (S)  *  [  ^(Sj/Mj-O/t]  +  i. 

M^  Is  the  maximum  height,  t  Mj  is  the  threshold  for  discriminating  between 
"long"  and  "short". 

For  non-terminals,  there  are  two  more  features.  These  two  features  are 
defined  by  the  transfer  functions. 

The  final  semantic  well-formedness  test  Is: 

W  (0)  >  t,  ,  A  (0)  <  t3 
R1  (0)  >  t2 
R2  (0)  >  0. 

C  (0)  is  the  location  of  the  edge  or  the  front  edge  of  the  highway.  W  (0) 
is  the  width.  R2  (0)  »  1  indicates  edges.  R2  (0)  =  2  indicates  highway, 
tj,  t2,  and  t^  are  preset  thresholds. 

EXPERIMENTAL  RESULTS 

The  experimental  results  are  shown  in  Figs.  2-7. 


Fig.  2  Original  picture.  The  grey  level  distribution  along  the 
straight  line  are  the  input  signals.  Fig.  3  is  the  grey  level 
distribution  of  the  left-most  line  segment.  Fig.  4  is  the  one 
next  to  Fig.  3.  Fig.  5  is  the  one  next  to  Fig.  1*.  Fig.  6  is 
next  to  Fig.  5.  Fig.  7  is  next  to  Fig.  6.  From  Fig.  3  to  Fig.  7, 
we  look  for  highway  only. 
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Fig.  3  No  highway  is  found 


Fig,  4  Highway  is  defined  by  two  vertical  lines. 
The  width  of  highway  is  25  pixels. 
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Fig.  5  Highway  is  defined  by  two  vertical  lines 
Its  width  is  26  pixels. 


Fig.  6 


Highway  is  defined  by 
Its  width  is  31  pixel 


two  vertical  lines 


Fig.  7  No  highway  is  found 
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SYNTACTIC  SHAPE  RECOGNITION 


K.  S.  Fu  and  K.  C.  You 

INTRODUCTION 

Since  the  shape  of  an  object  ,  defined  as  its  outer  boundary,  provides 
important  information  from  its  structure  and  local  boundary  details,  the  syn¬ 
tactic  method  [1]  is  proposed  to  fully  utilize  this  information  for  recognition 
purpose.  In  the  last  report  [2],  the  primitive  descriptors  and  the  shape  gram¬ 
mar  have  been  discussed.  In  the  following  sections  of  this  report,  the  normal¬ 
ization  of  the  descriptors,,  the  implementation  of  a  bottom-up  parsing  scheme 
and  a  recognition  experiment  are  described. 

NORMALIZATION  OF  PRIMITIVE  DESCRIPTORS 

To  recognize  a  primitive  In  the  boundary,  we  simply  rely  on  the  similar¬ 
ity  measurement  between  the  primitive  descriptors.  But  the  digitization  In¬ 
troduces  different  noise  to  the  descriptor  with  respect  to  any  operation  of 
rotation,  scaling  and  translation.  Normally,  the  digitization  noise  Introduced 
due  to  rotation  is  much  more  significant  than  that  due  to  scaling  and  trans¬ 
lation,  because  the  rotation  influences  the  angle  feature  significantly. 
Although  we  can  apply  some  smoothing  techniques  to  the  boundary  vector  chain, 
the  true  boundary  can  not  be  recovered  in  all  cases  completely.  We  need  to 
study  the  possible  distribution  of  the  feature  values  under  various  rotations 
to  help  constructing  a  proper  similarity  measurement. 

In  following  paragraphs,  we  would  concentrate  on  the  influence  of  rotation 

to  the  shape  of  information  of  a  curve  primitive.  The  shape  Information  is 
C  c 

characterized  by  (  /L,  A,  S/L) ,  which  can  be  obtained  from  the  descriptor 
(£,  L,  A,  S)  [2]. 

The  normalization  function  is  defined  as  follows: 
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Definition  1: 

Normalization  function,  N,  of  the  curve  primitive  descriptor  is: 

N:  (C,  L,  A,  S)-  (X,  Y,  Z) 

where  X  ««  C/L  0  <_  X  <_  1 

Y  **  /2ir  (angle  in  terms  of  revolution) 

$ 

Z  -  /AL,  -0.5  <  Z  <  0.5  for  a  simple  curve  segment  [2] 

An  experiment  was  designed  and  carried  out  through  following  steps  to 
study  the  distribution  of  the  normalized  variables  under  different  rotations. 
Experiment: 

1.  A  picture  with  clear  boundary  was  scanned  with  respect  to  8  various 
rotation  angles. 

2.  Shapes  on  the  digital  pictures  were  traced  out  and  passed  through  a 
smoothing  procedure.  The  vector  chains  were  obtained. 

3.  Manual  extraction  of  primitives  from  the  chain  was  performed  via  an 
Interactive  procedure. 

A.  The  descriptors  of  the  manually  extracted  primitives  were  computed 
and  transformed  by  N. 

5.  Studied  the  distributions  of  the  normalized  variables. 

The  three  normalized  variables,  X,  Y,  Z,  constitute  a  three-dimensional 
space,  named  3~D  for  short.  Table  2.1  illustrates  the  pictures  used,  the 
curve  primitives  and  the  corresponding  symbols  In  following  figures.  Figure 
2.1  -  2.3  show  the  two-dimensional  displays  of  the  distributions  of  descriptors, 
each  symbol  in  the  figure  Indicates  the  position  of  a  descriptor  In  3"D. 

Several  interesting  aspects  are  observed: 

(1)  The  3  variables  well  characterize  the  shape.  Any  pair  of  clusters 
can  be  separated  in  at  least  one  of  the  2-dim.  displays. 
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(2)  The  number  of  points  is  not  enough  to  reveal  a  parametric  distribution, 
but  the  points  within  each  cluster  are  considerably  close  together. 

(3)  The  variable  Z  *  is  more  spread-oiit  than  the  other  two.  The  reason 
might  be  that  the  noises  in  A  and  L  are  accumulated  in  the  calculation  of  S, 
which  is  the  summation  of  the  partial  products  of  A  and  L. 

This  experiment  is  not  sufficient  to  demonstrate  the  distribution.  Besides, 
the  distribution  could  be  changed  with  the  boundary  smoothing  techniques. 

However,  this  experiment  gives  us  an  idea  to  construct  the  similarity  measure¬ 
ment. 

For  the  simplicity  of  computation,  we  may  assume  that  each  curve  primi¬ 
tive  has  a  reference  point,  or  prototype  point.  In  the  3“D.  The  descriptors 
of  a  subchain  of  vectors  are  compared  with  the  reference  points  via  a  distance 
measurement.  This  distance  measurement  between  two  shapes  can  be  defined  as 

Ds(q,.q2)- 

Definition  2; 

D.(q,,q2)  is  the  d'stance  between  two  curves  qj  and  q2,  where  N(q{)  « 

(x[»  Yj»  zj)»  and  ^(qj.q^"  func  (x|»  x2*  Yj»  Y2»  z|>  Z2^ • 

As  mentioned  in  [2],  the  relative  size  of  a  curve  segment  to  the  whole 

boundary  also  provides  information  for  identifying  the  primitives.  We  define 

another  measurement  for  the  size  difference. 

Definition  3: 

Sd(ql,q2^  measures  the  difference  between  the  relative  sizes  of  two 
curves  qj  and  q2» 

T(q, )  -  (Ct/L,,  Ll/Ll0,  a,,  St/L, )  [2] 
sd(q,,q2)  -  func  (li/l|c,  l2/l20) 
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IMPLEMENTATION  OF  A  RECOGNIZER 

A  conventional  syntactic  recognizer  contains  two  steps:  primitive  ex¬ 
traction  and  parsing.  The  former  step  converts  the  unknown  pattern  Into  a 
presentation  of  primitives  and  the  later  step  checks  the  structure  of  the  pre¬ 
sentation  with  the  grammar  rules.  If  the  structure  fits  the  rGles,  the  un¬ 
known  pattern  Is  accepted,  otherwise  rejected. 

Our  recognizer  combines  the  two  steps  Into  one,  the  advantages  would  be 
explained  later.  The  Earley's  parsing  algorithm  [3l  has  been  modified  to 
achieve  our  task.  For  recognition  purpose,  we  need  only  the  first  part  of 
Earley's  algorithm,  i.e.,  the  generation  of  parsing  table.  The  algorithm  Is 
modified  so  that  It  accepts  the  boundary  vector  chain  Instead  of  the  primitive 
string  as  Its  input.  In  other  words,  the  primitives  extraction  Is  embedded 
In  the  parsing  table  generation. 

The  context-free  shape  grammar  G  Is  of  the  form: 

G  -  (VN,  VT,  P,  S)  [2] 

where 

V-j.  *  {S,  N's|N:  nonterminals) 

Vj  m  {F's,  A's|F:  curve  primitive.  A:  angle  primitive} 

Before  discussing  the  algorithm,  some  definitions  are  given  as  follows: 
Definition  4: 

V|  Is  a  vector 

v,  -  -  v  Is  a  closed  vector  chain, 
i  n 

D(X)  Is  the  descriptor  of  a  terminal /nonterminal 

X  e  (  (VN  U  VT)  -S) 

D(i,J)  denotes  the  descriptor  of  the  subchain  Vj  -  -  v . , 

D(X)  s  D(l,J)  Implies  that  subchain  v^  —  v^  can  be  recognized  as  X 
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lj»  1 2 » • • * • *n+i  are  the  Parse  lists 

For  item  [A-*a  •  8,  I]  in  lJt  1  <  I  <  J,  means 

<1)  Iff  a^X,  (empty  string)  then  a  *>v,v.  ,  —  v. 

i  i+I  j 

(2)  if  a ■  X,  then  5  ■=  j 

The  following  assumption  makes  it  possible  to  recognize  a  primitive. 
Assumption: 

D(X)  -  D(i  J),  iff  Ds(N(X),  N(i,j))<  d.  and 
Sd(X,(!,j))  <  s,  where  d,  s  are  thresholds. 

The  Modified  Algorithm 

(1 )  Add  [S  •  a,  1  ]  to  I  ^ 
for  all  S  ->  a  in  P 

J  ■  1 

(2)  (a)  If  [N  a  •  B8,  i]  Is  In  K 

B  y  in  P 

add  [B  •  y,  j]  to  lj. 

(b)  [N  -*•  a  • ,  i]  Is  in  1 . 

then  for  al  I  [B  -*•  8  •  N  y»  k]  in  I  ] 
add  [B  -*■  B  N  •  y.  k]  to  1^ 

(3)  j  -  j+1 

if  j  >  n+1  go  to  (i») 

For  all  [N  a  •  X  B,  l]  in  lk,  1  <  k  <_  j  X  e  {F’s,  A's) 

(a)  if  8  *  X  and  D(X)  «.  D(k,j) 
then  add  [N  a  X  •  B,  i]  to  \. 

(b)  If  B-X,  D(X)  «  D(kJ)  and  0(H)  »  0(1, j) 
then  add  [H  +  9  X  •  ,  i]  to  lj 

go  to  (2) 

(k)  If  [S  -►  a  •  ,  1]  in  ln+j  for  some  a,  then  w  e  L(G) 


L 


3*1 

Por  all  S-*«.  in  P 
add  Cs-»*oC,lJ  to  Ix 


Por  I*,  (a)  If  CN^oC.Bji.i]  in  I.,  and  B-»t*  i: 
3  add  CB-fr-rj]  to  3I,. 

(b)  If  (N^*-,iJ  in  L 

than  for  all  (Bkp'Nt'.k)  in  I, 

add  (B-* (9 N't*  ,10  to  i:.  1 

J 


in  P 


3*3+1 


yasl?^ 

weL(G) 


w^L(G) 


STOP 


Por  all  (N-*oi-XM]  in  Ik,  : 

Lik^j,  X«[f*s,A's} 

M 

K-^=A,  D(X)=D(k,j), 

and  D(N)sD(i,j) 

add  [Jl-**X*  ,ij  to 

V 

(b) 

If  MX,  and  D(X)s;D(k 

.  j) 

add  CN-*oCX-p,i]  to 

V 
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The  following  example  illustrates  why  we  combine  the  primitive  extraction 
Into  the  parsing  scheme.  Let  us  see  the  step  (3. a).  In  extracting  Fj  of 
[N  -*■  a  •  Fj ,  iQ]  In  1^  from  the  vector  chain  D(Fj)  *  D(KQ,j)  and  D(N)  - 
D ( I Q , j )  may  be  valid  for  both  j  »  jj  and  j2.  In  other  words,  subchains 
v  -  -  v,  and  v  -  -  v.  are  candidates  for  F, ,  so  [N  -*■  a  F,  •  I.]  Is  In  I. 

Kq  J,  K0  J2  11°  -»1 

and  [N  -*•  a  F^  •  ,  i^]  Is  In  lj  .  Suppose  that  Jj  <  After  the  execution  of 
step  (2)  for  j  =  j*2,  [B  -*•  0  N  •  y,  K^]  is  In  lj  and  [B  +  M  •  y>  K^]  's  ,n 
lj  where  <_  IQ  _<  KQ  <_  jj  <  j2<  Suppose  that  y  *>  ajF2A2F3’  t*ie  execut*on  °f 

step  (3.b)  to  extract  Aj  of  [B  0  N  •  A]F2A2F3’  Kl^  'n  1  j  and  *j  from  the 
vector  chain  may  find  out  that  D(A^)  i  D(j^,j^)  for  any  >  jj,  but  D(A^)  * 

D  ( j  2  *  j  |^)  for  some  >  J2>  then  only  [B  -*•  0  N  Aj  •  F2A2F3’  Kl^  is  added  to  • 
That  is,  the  context  information  Is  used  to  select  the  subchain  v 


v. 

J. 


for  F,  and  discard  v„ 


-  V 


j  •  If  D(Aj)  «  D(j j ,J j)  for  some  j ^  >  j^,  and 
D (A^ )  «  OOj.J^)  for  some  >  J2,  then  [B  +  0  N  Aj  •  f2A2F3’  K|^  ls  ,n  'j 
and  [B  +  0  H  Aj  •  F2A2F3*  Kl^  is  ln  *j  *  execut,on  step  (3.b)  to 

extract  F 2  from  the  vector  chain  may  find  out  that  D(F2)  i  D(j^,J^)  for  any 
J5  1  j3  >  j2,  but  D(F2)  «  DO^.jg)  for  some  jg  Uj,  >  J,.  then  only  [B  -*■  0  N 
a,F2  •  A2Fj,  K,1  is  added  to  lj  .  That  is,  the  lookahead  on  the  information 


J. 


for  Fj . 


of  subchain  v.  -  -  v.  selects  the  candidate  v„  -  -  v 

Jl  J5  "o 

In  fact,  the  extraction  of  A's  and  F's  embedded  In  our  parsing  is  dif¬ 
ferent  from  the  pre-extraction  of  the  primitives  without  knowing  the  context 
Information.  The  advantage  is  that  the  extraction  would  be  much  more  accurate 
in  a  global  sense. 

EXPERIMENTAL  RESULTS:  SHAPE  RECOGNITION  OF  THE  TOP  VIEWS  OF  AIRPLANES 


The  main  purpose  of  this  experiment  is  to  testify  whether  the  proposed 
method  is  capable  of  recognizing  shape.  The  Idea  Is  to  recognize  noisy  shapes 
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from  digital  images  by  using  the  modified  Earley’s  parsing  algorithm  with 
given  models  and  shape  gramma rsk 

In  constructing  shape  grammars,  we  must  consider  how  to  obtain  a  better 
segmentation  of  a  shape.  Since  the  picture  Is  scanned  line  by  line,  the  con¬ 
vex  points  of  the  object  are  usually  the  first  point  hit  in  the  scanning,  and 
hence,  the  starting  point  of  the  vector  chain.  Therefore,  our  grammars  and 
breaking  points,  which  break  the  shape  into  small  curve  segments,  are  designed 
such  that  most  of  the  possible  sentences  with  respect  to  different  starting  points 
are  contained  in  the  generated  languages. 

If  a  noisy  picture  is  unfortunately  distorted  at  a  certain  breaking  point, 
the  descriptors  of  the  two  related  curve  segments  may  be  badly  affected, 
because  the  direction  of  the  two  ends  of  a  curve  segment  affects  the  feature 
values  of  the  descriptor.  In  such  cases,  the  machine  may  misrecognize.  The 
solution  is  to  utilize  more  than  one  set  of  segmentation.  If  a  breaking  point 
with  respect  to  one  segmentation  is  distorted  badly,  it  can  be  passed  around 
by  another  set  of  segmentation,  because  it  may  not  be  the  breaking  point  in 
that  segmentation.  The  noise  occured  not  at  the  ends  has  little  effect.  In 
this  experiment,  each  of  the  grammars  utilizes  essentially  two  sets  of 
segmentation. 

A  less  noisy  shape  may  be  recognized  by  more  than  one  set  of  segmentation. 

In  other  words,  there  may  be  more  than  one  derivation  for  a  sentence.  This 
intended  ambiguity  [l]  increases  the  recognition  power  for  noisy  shapes. 

Nine  noisy  shapes  have  been  tried  against  three  grammars,  three  models 
of  airplanes,  with  the  modified  Earley's  parser.  All  of  the  27  tests  were 
correctly  accepted  or  rejected. 


r 


r 


i 


i 


» 


i 


» 


The  following  pages  are  the  shape  grammars,  and  their  corresponding  seg¬ 
mentation  graph.  The  nine  noisy  shapes  are  shown  In  Ftgure  4.7  -  4.15.  The 
small  circle  at  the  bottom  of  each  shape  Indicates  the  starting  point  of  the 


boundary  vector  chain. 


i 

\ 
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The  shape  grammar  for  AIRBUS*  Gb  =  (vb*  Pb*  Sb) 

vlvSil1*1**} 

V  {Pbj'  *bk|  1*3«22,  ISkilo) 

V 

^b-*’  NblAblFblAb2Nb2Ab3Nb3Ab3Nb4Ab2Fb2AblNb5Ab4 

(2)  Sb->  FblAb2Nb2Ab3Nb3Ab3Nb4Ab2Fb2AblNb5Ab4NblAbl 

(3)  Sb->  Nb2Ab3Nb3Ab3Nb4Ab2Fb2AblNb5Ab4NblAblFblAb2 

(4)  Nb3Ab3Nb4Ab2Fb2AblNb5Ab4NblAb1FblAb2Nb2Ab3 

(5)  Sb— ►  Nb4Ab2Fb2AblNb5Ab4NblAblFblAD2Nb2Ab3Nb3Ab3 

(6)  S b— >  Fb2AblNb5Ab4NblAblFblAb2Nb2Ab3Nb3Ab3Nb4Ab2 

(7)  *>  Nb5Ab4NblAblFblAb2Nb2Ab3Nb3Ab3Nb4Ab2Fb2Abl 

(8)  Pb3Ab5Fb4Ab6Fb5Ab7Fb6 

(9)  Nbi"^  Fb7Ab8Fb8Ab7Fb6 

(1°)  Nb2“*  Pb9Ab9PblOAblOPbll 

(11 )  NbZf->  Fw  2Xbl  QFbl  -Ab9Fbl  ^ 

(12)  Nb3->Fbl5AbllFbi6 

(13)  Nb5“>Fbl7Ab7Fbl  8^6^!  9^5^20 

(14)  Nb5-*Fbl7Ab7Fb21Ab8Fb22 


-The  shape  grammar  for  DC8,  G,  =  (  VJt  TJt  P 

•  CL  U  Q 

V{Sd-  "dil1*1*1*} 

V  {Pdi-  Adkl 

Pd* 

(1 )  Sd~>  Nd2Ad2Fdl Ad2Nd3Adl  Nd4Ad3 

(2)  Sd  >Nd2Ad2FdlAd2Nd3Adl Nd4Ad3NdlAdl 

(3)  Sd— »  pdiAd2Nd3AdlNd4Ad3NdlAdlNd2Ad2 

(4)  Sd— *  Nd3AdlNd4Ad3NdlAdlNd2Ad2FdlAd2 

(5)  Sd— #-  Nd4Ad3NdlAdlNd2Ad2FdlAd2Nd3Adl 

(6)  Sd— »  Nd6Ad2FdlAd2Nd7Ad4Nd8Ad3Nd5Ad4 

(7)  Sd— ♦  Nd8Ad3Nd5Ad4lfd6Ad2FdlAd2Nd7Ad4 

(8)  Ndl“* Nd9Ad5 Ndl 0 


(9)  Nd5^Nd9Ad5Ndll 
(1 ° )  Nd9-*  Ndl 2Ad6Fd2 

(11)  Ndi2“*Fd3Ad7Fd4 

(12)  Ndl 2* Fd5Ad8Fd6Ad9Pd? 

(13)  Ndl (T* Fd8Adl 0 Fd9 

(14)  i*- r  pdt o Adi l Pdi  1 

(15)  ^d2”^  Pdl  ^dl  2Pdl  3 


1 


Nd6"~>  Fdl4Adl2Fdl3 

1  (17)  Nd3“>Fdl5Adl2Pdl6 

(i  8 )  Nd7"^  Pdl  5Adl  2pdl  7 
(19)  Nd4“^  Ndl3Ad5Ndl4 
(2°)  Nd8“” >  Ndl5Ad5Ndl4 
(21^  Ndl3”>Pdl8AdlOPdl9 
Ndl5"*  Fd20AdllPd21 
(23)  Ndl4^  Pd22Ad6Ndl5 

|  *22f)  Ndl5_>  Pd23Ad9Fd24Ad8Fd25 

(25)  Ndl5“*‘Fd26Ad7Fd27 

8 


The  shape  grammar  for  BAC  1-11 ,  G_  =  (  V.,  T 
Tc*{se-  Nci|1<i<8} 

*c  *{'«!•  Ack|  UW?) 

pe  1 

(1)  Sc — ►NciAclNc2Ac2PclAc2Nc3AclNc4Ac3 

(2)  Sc — >Nc2Ac2PclAc2Nc3AclNc4Ac3NclAcl 

(3 )  Sc  > Fcl Ac2Nc3Acl Nc4Ac3Nc1 Acl Nc2Ac2 

(4)  Sc  >  Nc3AclNc4Ac3NclAclNc2Ac2PclAc2 

(5)  Sc — >-Nc4Ac3NclAclNc2Ac2PclAc2Nc3Acl 

(6)  sc— >  nc6ac2pc1  ac2nc7ac4nc8ac3nc5ac4 

C7 )  Sc  >  Nc8Ac3Nc5Ac4Nc6Ac2Pd Ac2Nc?Ac4 
(8>  Ncl— >Fc2aC5PC3 

(9)  Nc5~>Fc2Ac5Pc4 

(10)  NC2— Pc5Ac6Fc6Ac?Pc7 

(11)  Nc6->  PC8Ac6Pc6Ac7Fc7 

(12)  nC3— >  Pc9Ac7Pcl0Ac6Fcll 

(13)  nC7">  Fc9Ac7Fc10Ac6Pc12 

(14)  Ncjf->Fcl3Ac5Pclif 

(15)  Nc8”>  Pcl5Ac5Pcl4 
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IMAGE  CLASSIFICATION  EMPLOYING  STATISTICAL  CONTEXT 


E.  F.  Kit  and  P.  H.  Swain 


INTRODUCTION 


Useful  information  from  remote  sensing  data  is  obtained  by  accurately 
classifying  each  region  In  the  image.  In  the  past,  work  dealing  with  the 
computer  analysis  of  image  data  collected  from  high-flying  aircraft  and  earth 
orbiting  satellites  has  focused  on  classifying  each  point  in  the  image  (on 
the  ground)  in  terms  of  its  spectral  variation  in  electromagnetic  field 
strength.  Spectral  classifiers  generally  classify  a  single  image  point  based 
solely  on  data  associated  with  that  point. 

Recently  the  focus  has  been  broadened  to  employ  spatial  relationships  in 
the  data.  Even  an  amateur  photo  interpreter  would  concur  that  this  informa¬ 
tion  component  is  substantial.  For  example,  automobiles  occur  more  often  on 
highways  than  in  water,  and  boats  are  more  often  in  water  than  on  highways. 
Then,  even  if  their  spectral  appearance  is  the  same,  boats  can  be  discrim¬ 
inated  from  automobiles  if  their  context  is  considered.  The  data  immediately 
surrounding  an  image  point  is  intimately  associated  with  it  and  can  be  used 
to  throw  light  upon  its  true  nature. 

One  approach  to  the  statistical  treatment  of  context  was  suggested  by 
Welch  and  Salter  [1],  However  their  basic  assumption  (made  in  an  effort  for 
a  practical  solution)  was  that  contextual  relationships  between  nonadjacent 
cells  are  negligible.  They  later  went  on  to  make  extensive  comparisons  be¬ 
tween  the  recognition  performance  of  a  four  neighbor  rule  and  an  eight  neigh¬ 
bor  rule. 

The  model  we  have  formulated  is  based  on  a  significantly  different  ap¬ 
proach  to  spatial  dependencies.  Contextual  configurations  of  arbitrary  shape 
(see  Fig.  I)  can  be  dealt  with  by  out  model  and  statistical  dependencies 


extracted  from  a  typical  scene  uSed  to  classify  a  new  scene  are  less  complex 
and  more  readily  determined  than  those  required  by  the  Welch  and  Salter  ap¬ 
proach.  The  block  diagram  shown  in  Fig.  2  summarizes  our  initial  experiment 
and  in  the  next  section  we  focus  on  context  extraction. 


Fig.  1.  Contextual  Neighborhoods  of  Arbitrary  Shape 
Context  Extraction 

Recall  the  context  classification  class  discriminant  function  as  develop¬ 
ed  In  the  previous  progress  report: 

r  [  n  f  (xj]  Gp(0) 

0,  i=l  ei  r 

0J-0 

where  0j  and  Xj  are  respectively  the  state  and  measurement  of  the  ith  cell  in 
the  p  array  and  0j  and  Xj  refer  to  the  cell  being  classified. 

Tabulation  of  the  frequency  distribution,  Gp(0),  for  typical  scenes  of 
interest  began  as  a  challenging  problem  because  of  the  growth  of  memory  re¬ 
quirements  due  to  large  window  size  (context  neighborhood)  and  large  number 
of  classes.  Consider  the  pixel  array  In  Fig.  3  and  the  simple  case  where  the 
context  window  size  is  three  and  number  of  classes  is  two  as  shown  In  Fig.  4. 


Fig.  2.  Block  Diagram  of  Initial  Experiment 
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To  tabulate  the  frequency  distribution,  the  specified  window  is  placed  over 
the  pixel  array  beginning  in  the  upper  left  corner.  A  count  associated  with 
this  combination  of  classes  Inside  the  window  is  incremented.  The  window  is 
moved  one  pixel  to  the  right,  and  the  process  repeated  until  the  specimen 
pixel  array  has  been  exhausted. 

The  simplest  solution  is  to  reserve  memory  for  each  possible  combination 
and  to  store  the  associated  frequency.  Consider  the  fairly  typical  situation, 
however,  in  which  the  window  size  is  nine  (three  by  three  neighborhood)  and 
the  number  of  classes  is  seventeen.  Each  three  by  three  block  can  take  one 
of  17  to  the  ninth  power  (or  120000000000)  possible  configurations  and  the 
simple  approach  is  obviously  infeasible. 


A  and  F  represent 
distinct  classes 


A 

F 

A 

A 

F 

A 

F 

F 

F 

A 

A 

F 

F 

A 

A 

A 

Fig.  3.  Sample  Pixel  Array  for  Frequency 
Distribution  Determination 

Note  that  a  given  scene,  having  finite  size,  must  contain  only  a  small 
fraction  of  this  huge  number.  For  example,  consider  classifying  a  200  pixel 
by  200  pixel  scene.  In  the  worst  case  there  are  40000  different  combinations 
present.  Many  of  these  40000  will  be  repeated  often,  further  reducing  the 
number  of  unique  combinations.  Only  combinations  which  actually  occur  need 


be  stored 


Window  Size  ■  3 
Number  of  Classes  «  2 


Possible  Configurations 


Fig.  4.  Sample  Context  Problem 

Figure  5  summarizes  the  presently  Implemented  approach.  Each  time  a 
combination  Is  obtained,  the  binary  tree  Is  traversed  until  the  combination 
Is  found  or  until  a  null  node  is  encountered.  If  the  combination  is  already 
containeu  In  the  tree,  the  count  associated  with  that  combination  is  incre¬ 
mented.  Otherwise  this  is  the  first  time  this  combination  has  occurred,  and 
therefore  a  new  node  is  added  to  the  tree  and  Its  associated  count  is 
initial ized  to  one. 

For  our  first  frequency  tabulation,  the  frequency  distribution  was  de¬ 
termined  from  a  classification  114  columns  by  251  lines.  The  total  number  of 
nodes  in  the  final  binary  tree  was  found  to  be  18502.  The  ratio  of  the  actual 
number  of  unique  combinations  to  the  possible  number  of  unique  combinations 
(a  function  of  scene  size)  provides  an  indication  of  the  "amount"  of  context 
in  the  scene.  In  this  case  our  "scene  context  ratio"  was: 

scene  context  ratio  ■  18502  ■  *  .65 

114  x  251 


I.  Input 


a.  Classification  Results  File 

b.  Window  Configuration 

c.  Image  Size 


2.  Get  Combination 


3.  Tree 


Combination  Found 
a.  Increment  Count 

Null  Node 

a.  Create  Node 

b.  Count  ■  1 


4.  If  More  Combinations  Go  to  2 


5.  Print  Tree  Using  Inorder  Traversal 

Fig.  5.  Binary  Tree  Implementation 

The  scene  context  ratio  is  always  greater  than  zero  and  less  than  or 
equal  to  one.  For  a  scene  with  frequently  occurring  combinations,  the  numer¬ 
ator  will  tend  to  be  small,  resulting  In  a  low  scene  context  ratio  and  in¬ 
dicating  a  high-context  scene. 

Context  Classification  and  Evaluation 

The  Context  Classification  algorithm  is  a  supervised  classification 
algorithm  since  it  requires  training  samples  representative  of  each  class  of 
Interest.  In  our  approach,  the  classes  are  assumed  characterizable  by 
multivariate  normal  probability  density  functions.  Therefore  the  training 
samples  are  used  to  estimate  mean  vectors  and  covariance  matrices  for  each 
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The  goal  of  the  Context  Classifier  is  exactly  the  same  as  the  familiar 
Maximum  Likelihood  Classifier  (MLC)  and  is  shown  in  Fig.  6. 


nxmxv  nxm 

!R  *  1R 

Fig.  6.  MLC  and  Context  Classification  Goal 


The  number  of  lines,  columns,  and  channels  in  the  image  are  represented  by  n, 
m,  and  v  respectively.  These  classifiers  are  responsible  for  assigning  a 
given  Input  pattern  to  a  class  by  comparing  the  feature  vector  with  a  set  of 
training  patterns.  The  common  performance  measure  among  Context  and  con¬ 
ventional  ML  Classifiers  is  "minimum  probability  of  error". 

The  important  input  difference  which  distinguishes  the  Context  Classi¬ 
fier  is  shown  In  Fig.  7. 


number 
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channels 
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channels 


Fig.  7  Input  Difference  Between  Context 
and  ML  Classifiers 
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For  the  full  window  shown,  where  q  ■  r  ■  3»  the  Context  Classifier  inputs 
nine  times  the  information  per  poinrt  of  MLC.  As  mentioned  earlier,  any  sub¬ 
set  of  the  full  window  may  be  easily  used,  and  if  q  *  r  ■  I,  the  Context 
Classifier  Is  precisely  the  same  as  the  HLC. 

Software  supporting  the  Context  Classifier  has  been  developed  and  is 
currently  under  test  using  mul tl spectral  remote  sensing  imagery.  Results  of 
applying  this  approach  to  classification,  including  qualitative  and  quantita¬ 
tive  comparisons  with  MLC  results,  will  appear  in  the  next  progress  report. 


REFERENCE 

[1]  J.  R.  Welch  and  K.  G.  Salter,  "A  context  algorithm  for  pattern  recogni¬ 
tion  and  image  interpretation,"  IEEE  Trans,  on  Systems,  Han  and  Cyber¬ 
netics,  Vol.  SMC-1,  No.  1,  January  1971* 


88 


A  SPATIAL  STOCHASTIC  MODEL  FOR  CONTEXTUAL  PATTERN  RECOGNITION 

T.  S.  Yu  and  K.  S.  Fu 

Abstract 

A  contextual  classification  algorithm  using  a  spatial  stochastic  model 
(Markov  random  f;eld)  's  proposed.  The  requirements  for  the  joint  probabil¬ 
ity  function  on  the  two-dimensional  lattice  are  discussed.  The  distinction 
between  the  spatial  correlation  context  and  the  transition  probability  con¬ 
text  is  made.  Conceptually  clear  procedures  for  construction  of  the  model 
are  given.  The  coding  technique  for  parameter  estimation  is  presented.  An 
extension  of  the  model  in  the  multivariate  site  variable  case  is  derived  to 
handle  the  mul  tispectral  satelMte  data.  Experiments  with  remote  sensing 
data  are  performed  and  results  are  compared  with  simple  (no  context)  rule 
results.  Less  frequently  occurring  classes  such  as  highways  and  commercial 
areas  were  found  to  be  classified  better  using  the  contextual  algorithm  with 
only  a  reasonable  increase  in  time  computation. 

1.  Introduction 

Compound  decision  theory  has  enabled  us  to  introduce  contextual  infor¬ 
mation  into  the  statistical  decision  procedure  for  image  interpretation. 
Based  on  initial  success  in  using  context  information  C11,  C2D,  the  investi¬ 
gation  has  been  extended  and  is  go’’ng  beyond  the  simple  (no  context)  clas¬ 
sification  to  improve  the  decision  procedure  in  areas  where  the  need  is 
clear  and  the  possibilities  are  promising.  The  outcome  of  this  investiga¬ 
tion  has  been  a  conclusive  demonstration  of  the  feasibility  of  applying  a 
spatial  stochastic  model  to  analyze  digital  satellite  data.  The  effective¬ 
ness  of  this  model  was  demonstrated  by  the  minor  increase  4n  computation 
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time  and  the  improved  classification  results.  Some  mod if ’cations  and  exten¬ 
sions  of  the  model  were  specifically  developed  or  substantially  refined  dur¬ 
ing  this  investigation  to  cover  more  general  situations.  A  wide  variety  of 
agricultural  areas  and  urban  residential  areas  were  chosen  for  experiments. 
Other  classes  in  the  satellite  data  include  pastures,  forests,  water  bodies 
and  highways. 


2.  Proposed  Approach 

Under  appropriate  assumptions,  the  derived  compound  decision  rule  C2] 
is  to  choose  act4on  a  e  A  to  minimize 

53  L(ek,a)p(xk/ek)G<ek)  n  23  P(xk  /ek  )P(0k  /ek>  <1> 

\  <=1  9kj  *  *  f 

where  ek  e  e  is  the  kth  class,  xfc  is  the  pattern  vector  of  cell  k,  G  is  the 

a-priori  probability  function  and  L  is  the  loss  function.  Suppose  we  let 
4  V* 

G.(ek)  =  G(©k)  n  Z,  P(xfc  /ek  )P(@k  /0|).  The  decision  rule  Ci  becomes 

’=1  ek.  4  1  4 

1 

53  L(ek,a)P(xk/0k)G1<ek)  .  (2) 

9k 

Rule  (2)  is  of  the  same  form  as  the  simple  decision  rule  except  for  the  a- 
priori  probability  G^(0k>.  *ote  that  G^<ek)  is  now  the  probability  of  oc¬ 
currence  of  the  four  nearest  neighboring  states  of  @k  together  with  that  of 
@k.  Since  the  neighboring  states  are  unknown,  we  have  to  obta4n  the  proba¬ 
bility  of  the  neighboring  patterns  belonging  to  each  individual  class  and  of 
each  individual  class  being  a  neighbor  of  0k»  The  four  multipliers  ;n  the 
product  term  in  (1)  represent  the  contextual  contributions  from  the  four 
neighboring  cells.  Context  information  in  terms  of  directional  transition 


» 


r 
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probabilities  is  introduced  in  tins  decision  scheme. 

The  first  problem  with  this  scheme  is  the  vaMd’ty  of  the  estimated 
directional  transition  probabil ities  from  training  data.  Usual!/  the  ground 
truth  information  is  never  known  with  certainty,  especially  with  satellite 
data. 

Another  problem  is  that  we  assume  the  contributions  from  the  four 
neighbors  are  independent.  In  real  situations  this  is  not  generally  true, 
for  example,  agricultural  areas  are  usually  present  on  both  sides  of  a 
river. 

Instead  of  using  the  transition  probabilities,  we  are  interested  in  us¬ 
ing  the  correlations  between  different  patterns  since  the  correlation-type 
context  does  not  require  us  to  know  the  true  classes  of  the  patterns.  In 
particular,  we  would  I  ike  to  obtain  the  joint  density  function  of  and  its 
four  nearest  neighbors  (Figure  1)  because  the  joint  distribution  function 


*7 

K11 

*2 

*6 

K12  K3 

K 

K1  K10 

K8 

*4 

*5 

Figure  1.  Order 

*9 

of  neighborhood  of  cell  K 

contains  all  the  correlations  between  alt  pa4rs  of  cells.  In  later  classif¬ 
ication  processes  we  shall  concentrate  on  the  observation  vector 

Xk  =  *xk'xk  'xk  /Xk  /Xk  *  4nstead  of  just  on  x^  to  classify  cell  k. 

1  2  3  A 

Notice  that  this  joint  density  function  must  be  defined  consistently. 
When  we  try  to  classify  cell  ,  we  need  to  use  the  same  joint  density 
structure  as  when  we  classify  cell  K.  This  implies  that  the  marginal  joint 
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d’ st r •'button  functions  of  cells 


Figure  2.  Two  NB  (nearest  neighbor)  system  samples 

overlapping. 

xk,xfc  should  be  the  sar.»  as  those  of  cells  x^,xk  (see  Figure  2).  In  the 
Gaussian  case,  th’S  means  that  the  variances  of  cells  K,K1,K^  are  all  the 
same  and  that  the  correlations  between  K,K^  are  the  same  as  those  between 
K,K^.  In  general,  we  must  require  that  the  labelling  of  the  different  cells 
should  not  affect  the  definition  of  the  jo4nt  density  functions  for  these 
cells.  Also  we  must  require  that  all  the  variance-covariances  be  distance 
dependent  but  not  position  dependent.  For  example,  (see  Fig.  1)  the  covari¬ 
ances  between  cells  K,Kg  should  be  the  same  as  those  between  cells  K.j,K^ 
since  the  distance  between  either  pair  is  the  same.  This  requirement  is 
commonly  known  as  "stationarity". 

We  need  to  construct  a  two-dimensional  random  process  with  the  sta¬ 
tionarity  property.  In  particular,  we  are  mostly  interested  in  the  Gaussian 
plane  process  because  the  gaussian  density  functions  can  be  specified  by  the 
second  moments.  This  means  that  we  should  be  able  to  specify  covariances 
between  any  pair  of  cells  (or  sites)  in  the  lattice  after  constructing  the 
stationary  random  process. 
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3.  A  Markov  Process  on  a  Torus 

The  procedure  for  constructing  a  two-dimensional  stationary  process  is 
similar  to  the  one-dimensional  case.  Here  we  first  construct  a  similar  pro¬ 
cess  on  a  lattice  torus  *n  which  all  random  variables,  x  ,  are  identically 

mn 

normally  distributed  and  satisfy  a  Markovian  property.  Then  we  let  the  size 

of  the  torus  tend  to  infinity  to  form  the  stationary  plane  process.  Let  the 

torus  be  defined  by  all  pairs  of  integers  (j,k>,  j=G,...,p-1,  k=0,...,q-1 

where  p  and  q  are  identified  with  zero.  There  are  pq  random  variables  x.. 

J  * 

at  points  (j,k)  and  we  suppose  that  they  have  a  joint  normal  distribution 
with  the  density  function 

p(*)  =  K  expkpCr  x?k 


+  a  I  xjk(xjv|,k+xj-1,k+xjA+1+xj,k-1>>:1  G> 

where  |a|  <  ,  and  K  is  a  constant.  The  quadratic  form  inside  the  brackets 

can  be  shown  to  be  positive  definite. 

This  torus  process  has  been  discussed  separately  by  Besag  [T]  and  Moran 
C4D.  Besag  presented  a  conditional  probability  approach  to  specify  the  spa¬ 
tial  process  and  derived  the  joint  density  function  which  is  equivalent  to 
(3).  Here  we  follow  Moran's  procedure  for  constructing  the  desired  plane 
process  since  it  is  brier  and  clear. 

We  now  find  the  variance-covariance  matrix  of  the  distribution  of  the 
X's  by  inverting  the  matrix  of  the  coefficients  of  the  quadratic  form  in 
(3).  We  write  the  pq  random  variables  as  a  vector  X  equal  to 

^X00/X01 '*  *  *'X0  q-V  X11  *  *  *  *^X‘]  *  •  •  •  ,xp— 1  q— 1  ^  ^  write  the  matrix  Q  as 

(C  )  where  C  is  the  element  in  row  u.q+v.  and  column 

Uj-Uj/V^-Vg  u^-u^^v^-Vj  1  1 

u2q+v2  where  u^  =  0,1,...,p-1  and  v^  =  0,1,...,q-1  and  their  sums  and 
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differences  are  taken  modulo  (p)  and  (q)  respectively. 

We  construct  the  matrix  Q  as  follows.  Consider  the  p  x  p  circulant  ma¬ 
trix 


x  y  0  • 
y  x  y  0 


*  y 

•  0 


y  0  0 


y  x 


and  replace  the  x,  y  and  zero  by  A,  8  and  zero  q  x  q  matrices  respectively 
where  A  is  the  p  x  p  matrix 

‘1  a  0  •  •  a" 

a  1  a  •  •  0 

0  a  1  •  •  0 

•  ••••• 

a  0  •  0  a  1_ 

and  B  is  a  p  x  p  diagonal  matrix  with  a's  down  the  main  diagonal.  Not’ce 
that  the  Q  matrix  here  is  exactly  the  same  as  Besag's  B  matrix  in  his  paper. 

Next  we  can  do  a  unitary  transformation  to  diagonalize  Q  in  order  to 
find  its  inverse.  We  only  give  the  final  result  here.  Write  Q  ^  =  (bst) 


where  s  =  u^q+v^,  t  =  u^p+v^.  Then 


bst  =  53  <pq>  1  exp  Zititu^j  p  n+v,jV^  q_1> 


-1 


-1. 


•  <1+2a  cos  2 it  v^q  ^a  cos  2ir  u^q-1>-1 


-1  -1 

•  exp  -2iri<u.,u^P  +v^v^q  > 


(4) 


where  m  =  p^q+v^. 


» 
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This  is  real  and  is  the  variance-covariance  matrix  of  the  random  variables 


on  the  torus. 


4.  A  Markov  Process  on  a  Plane  Lattice 


Our  goal  is  to  construct  a  stationary  Markovian  Gaussian  process  on  the 
lattice  -formed  by  all  positive  and  negative  integers  (m,n) .  To  do  ttrs  it 
is  sufficient  to  define  such  a  process  for  all  finite  sets  of  such  points 
provided  the  resulting  distribution  is  invariant  under  translation  (con¬ 
sistency  requirement).  It  is  therefore  sufficient  to  consider  any  fixed 
finite  set  of  points  on  the  torus  lattice  and  then  let  p  and  q  tend  to  in¬ 
finity.  Writing  exp  2iri  u^p  1  =  and  exp  2iri  u^q  1  =  z2  and  letting  p,q 

tend  to  infinity,  we  find  that  the  covariance  .  between  x_  and  x .. 

'  s,t  m,n  m+s,n+t 


_s-1  t-1 

^  f  f  - -Vr1-— 

4ir  12^1=1  | z2 1  =1  C1+a(z.j+z11)+a(z2+Z2  '3  1  2 

with  the  convergence  being  uniform  on  the  finite  set  of  points. 
We  can  obtain  the  covariances  explicitly  from  the  integral 


„  2 n  2* 

. .  .-L/  S 

4  it  0  0 


cos  se^  cos  teg 
tl+2a  cos  e<|+2a  cos  ep  d01d02 


Vqq  can  be  obtained  C43  analytically  as 


2w  /  (1-16  a2  sin2e)^~  de 


which  is  (2*>-1  times  the  complete  elliptic  integral  of  the  first  kind,  ta¬ 
bulated  in  Abramowitz  and  Stegum  C5]. 
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vo-i  s  voi  =  vio  =  v-io  =  1  a  a~'ioo) 


Other  values  of  V  .,  which  are  known  as  the  lattice  Green's  function,  are 
s,t 

much  more  complicated  to  evaluate.  We  used  summations  to  approximate  the 
integrals  in  (6)  to  obtain  Vg ^  and  for  our  NB  system. 

5.  Estimation  of  the  Spatial  Correlation  Parameter 


The  parameter  "a"  in  (3)  is  unknown  and  must  be  estimated  from  training 
samples.  There  is  more  than  one  way  to  do  the  parameter  estimation,  for  ex¬ 
ample,  see  C3]  C6].  However  we  will  describe  a  coding  method  which  is  very 
flexible  and  simple. 

We  have  mentioned  that  Besag  used  a  conditional  probability  approach  to 
define  the  torus  process.  Specif ical I y  he  derived  the  most  general  (but  con¬ 
sistent)  conditional  distribution  function  and  went  on  to  arrive  at  the 

joint  distribution  for  all  cells  in  the  rectangular  lattice.  The  condition^ 

2  2 

al  distribution  he  derived  was  PCx^/al  I  other  sites)  =  (2ir a  ) 

-1  -2  2 

expCy-o  ^X|(~U|C  I  p^-fx.-uJ  1,,e  Mar^ov  definition  used  for  defining 

this  process  was 


P(x^/al I  other  cel l s) 


=  P(x^/4  nearest  neighboring  cells  k) 

In  order  to  fit  a  first  order  scheme,  we  begin  by  labelling  the  interior 
sites  of  the  lattice,  alternately  x  and  •,  as  shown  in  Figure  3.  It  is 
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Figure  3.  Coding  pattern  for  a  first-order  (NB)  scheme. 
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immediately  clear  that,  according  to  the  first-order  Markov  assumption  the 
variables  associated  with  the  x  sites,  given  the  observed  values  at  all  oth¬ 
er  sites,  are  mutually  independent.  This  results  in  the  simple  conditional 
I ikel ihood. 


(9) 


for  the  x  site  values,  the  product  being  taken  over  all  x  sites.  The  condi¬ 
tional  Maximum  Likelihood  Estimate  of  the  unknown  parameter  can  be  obtained 
in  the  usual  way. 

The  defined  process  requires  that  every  variate  (associated  with  each 
cell)  be  identically  normally  distributed.  However,  in  practical  situations 
such  as  our  satellite  data,  there  are  many  classes  appearing  in  the  image 
and  they  are  statistically  characterized  by  different  mean  vectors  and  co- 
variance  matrices.  For  the  single-variate  site  variable  case  discussed  so 
far  we  may  easily  normalize  each  pattern  so  that  every  variate  has  zero  mean 
and  unit  variance.  In  the  multivariate-site  variable  case,  we  have  to  take 


into  account  other  considerations.  These  are  discussed  below. 


6.  Mul tivar»ate  Site  Variable  Extension 


We  w'll  consider  the  two-variate  site  variable  case  and  then  the  more 
than  two-variate  case  can  be  generalized  by  straightforward  extension.  Ima¬ 
gine  that  we  have,  at  each  s*te,  two  variates  for  the  NB  scheme  (Figure  A a). 


Figure  Aa 
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Figure  Ab. 

It  is  now  necessary  to  construct  a  10  x  10  covariance  matrix  to  incorporate 
all  kinds  of  correlations  between  these  10  variates.  However,  only  the 
correlation  between  one  variate  of  one  site  and  another  variate  of  another 
site  is  unknown.  We  have  to  find  this  correlation  to  form  our  covariance 
matrix. 

Consider  the  1  dimensional  NB  scheme  in  Figure  Ab  where  every  variate 
is  obtained  from  a  I  inear  combination  of  the  two  variates  at  every  site  in 
Figure  Aa.  The  process  in  Figure  Ab  must  preserve  the  original  Markov  pro¬ 
perty  of  the  one-dims'onal  process  discussed  earlier.  In  other  words,  all 
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the  variances  and  covariances  between  sites  in  Figure  4b  can  only  undergo  a 
scale  change  from  the  one-dimensional  case  such  that  the  overall  covariance 
structure  has  the  same  form  and  can  be  obtained  from  (6)  with  some  other 
correlation  parameter  "a". 


The 


2.  2. 


variance  of  site  k  in  Figure  4b  is  c^+c^+Zc^ c^ECx^^x^] .  Therefore 

•>  ?  • 

the  variance  is  increased  by  a  factor  of  c^+c^+Zc^ c^ECx^x^  (remember  the 

original  variate  had  un't  variance).  The  covariance  between  site  k  and  any 

ECx. ,x. '] 
k'  k 


2  2 

other  site  say  i,  is  ECx^x^]  {c,+c,+^cic 


rc2T‘cr2  i Txk;rr 


>  since 


ECx^x^D  =  ECx^x^D  and  ECx^x^D  =  ECx^x^. 

In  order  to  have  the  same  scale  change  for  the  covariance  as  for  the 
variance,  tie  must  impose 


“VV  = 


ECxk,xjt'] 


(10) 


This  will  enable  us  to  construct  our  10  x  10  covariance  matrix.  Another 
consideration  worth  noting  is  the  normalization  process  for  the  multivariate 
case.  An  orthonormal  transformation  is  required  to  transform  the  variates 
into  zero  mean  and  an  identity  matrix  as  their  covariance  matrix.  This  is 
al so-called  the  "Whitening  process"  C73.  The  transformation  matrix  is 
^ ~  *T  where  is  the  eigenvalue  matrix  and  «  =  is  the  eigenvector 

matrix  for  the  covariance  matrix  of  the  particular  class  the  pattern  belongs 
to.  After  the  orthonormal  transformation,  each  site  will  have  two  uncorre¬ 
lated  variates  with  zero  means  and  unit  variance.  Th’s  w’ll  make  our 
overall  (10  x  10)  covariance  simpler  because  of  (10).  The  10  x  10  covari¬ 
ance  matrix  for  the  10  variates  in  the  NB  system  (Fig.  4a)  would  be 


E10x10 


(10) 


where  T?  ’S  the  5x5  covariance  matrix  of  the  1-dim  MB  system  w4th  etements 
B 

evaluated  from  (6)  once  "a"  has  been  estimated. 

It  is  not  hard  to  show  that^Z^g^g  is  positive  definite  since  we  have 
guaranteed  that]C0  's  positive  definite.  Generalization  of  the  more  than 
two  variate  site  variable  case  can  easily  be  carried  out. 


7.  Experimental  Results 


The  classification  procedure  (contextual)  requires  us  to  know  the  four 
neighboring  classes  in  order  to  evaluate  the  jo4nt  density  function.  We  may 
not  need  the  true  class  but  some  class  knowledge  of  those  neighbors  ought  to 
be  known.  We  obtained  these  classes  by  doing  a  "preclassification"  using 
the  simple  (no  context)  decision  rule.  The  contextual  classification  w*  1 1 
then  be  called  the  "postclassification". 

The  data  used  for  the  experiment  is  the  satel I 4te  data  of  Lafayette, 
IND  under  Run  No.  72053609  provided  by  LARS*.  Usually  four  spectral  bands 
are  used  to  collect  Satellite  (LANOSAT)  data  (except  for  Temporal  data). 
However,  it  was  found  that  two  bands  (one  from  the  infrared  region  and  one 
from  the  visible  region)  are  enough  to  produce  maximum  feature  separation. 
The  two  bands  used  are  band  3  (0.7-0. 8  yM),  and  band  6(0. 6-0. 7  yM) . 

The  lack  of  large  homogeneous  areas  in  the  LANDSAT  data  provides  great 
difficulty  for  training  and  testing  purposes.  There  are  seventeen  classes 
in  our  data.  Their  statistics  and  the  two  band  (3,6)  ellipse  plot  are  shown 
in  Figures  5  and  6  respectively. 


We  decided  to  use  the  detailed  analysis  results  performed  by  the  LARS 
professional  analyst  to  do  the  estimation.  The  coding  parameter  estimation 
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Figure  5 •  Cospectral  Plot. 
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Figure  6.  Ellipse  Plot  of  Channel  3  Verses  6 
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result  and  the  evaluated  variances  and  covariances  are  given  ’n  Table  1. 


Band 

a 

voo 

V01 

V11 

V02 

0.7-0. 8  WM 

0.228 

1.489 

0.5372 

0.3216 

0.221113 

r  • 

0.6-0. 7  viM 

0.229 

1.486 

0.5370 

0.3212 

0.2210 

Table  1.  Coding  estimation  result. 


The  statistics  for  the  17  classes  were  obtained  by  the  analyst  at  LARS 
with  the  aid  of  reference  data  such  as  aerial  photography,  U.S.  geographic 
maps  etc.  However,  it  is  extremely  hard  to  use  this  reference  data  to  ob¬ 
tain  performance  accuracy  because,  for  one  thing,  it  <s  hard  to  store 
point-by-po’nt  ground  truth  in  the  computer.  Therefore  we  decided  to  exam¬ 
ine  our  results  visually  from  the  positive-negative  films. 

Two  areas  each  of  size  128  x  128  were  used  for  testing  the  contextual 
algorithm  performance.  These  results  are  shown  in  Figure  7,8  for  block  1 
(Mne  200-327,  col.  120-247)  and  block  2  (Hne  73-200,  col.  90-217)  respec¬ 
tively.  The  results  are  presented  with  the  two-band  and  4-band  simple  rule 
results.  It  can  be  seen  that  residen' :al  areas  (to  the  left  of  the  picture 
in  Fig.  7),  forests  and  pastures  are  all  classified  equally  well  for  the 
three  results.  However,  highway  and  commercial  areas  (white  dots)  are 
detected  much  better  in  the  contextual  rule  results  while  just  barely 
detected  in  the  no  context  4-band  simple  rule  results.  To  further  demon¬ 
strate  this.  Fig.  9  presents  results  of  classes  for  water  (river),  highway, 
and  commercial  areas  as  they  are  all  represented  by  white  color  dots.  It  is 
not  hard  to  see  that,  with  the  help  of  context  information,  these  seldom  oc¬ 
curring  classes  (represented  by  low  a  priori  probabilities)  are  correctly 
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Fig.  7  Experimental  Result  for  Block  1 
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classified.  The  estimated  performance  improvement  in  both  block  areas  is 
about  5X.  ~ 

The  computation  time  for  the  4-neighbor  (NBOR)  context  rule  is  less 
than  for  the  4-band  simple  rule  while  the  performance  is  better. 

r  • 

*LARS,  Laboratory  for  Applications  of  Remote  Sensing,  Purdue  University, 
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STABILITY  OF  GENERAL  TWO-DIMENSIONAL  RECURSIVE  FILTERS 


B.  O’Connor  and  T.  S.  Huang 

INTRODUCTION 

Two-dimensional  recursive  digital  filters  have  generated  much  Interest 
lately.  They  have  the  potential  of  saving  computer  time  and  memory.  After 
defining  recursive  filters  from  a  different  point  of  view  we  will  present  a 
number  of  stability  theorems.  Here,  it  will  be  shown  that  any  general  re¬ 
cursive  filter  can  be  mapped  into  a  first  quadrant  filter.  Furthermore,  a 
theorem  will  be  developed  which  relates  the  stability  of  any  digital  filter 
to  its  two-dimensional  phase  function.  Finally,  a  number  of  practical  sta¬ 
bility  tests  will  be  presented  including  one  which  consists  of  checking  the 
root  distribution  of  one  one-dimensional  polynomial. 

GENERAL  RECURSIVE  FILTERS 

A  general  recursive  filter  can  be  described  by  the  following  recursive 
equation: 

0(m,n)  *>  E  a(r,s)  I(m-r,n-s)  -  E  b(k,i)  0(m-k,n-fc)  (1) 

(r,s) ea  (k,fc)e3-(0,0) 

where  a(r,s)  and  b(k,fc)  are  real  finite  extent  arrays  with  respective  lattice 
supports  of  a  and  3  and  I (m,n)  and  0(m,n)  are  the  respective  input  and  output 
arrays.  Furthermore,  we  will  assume  that  b(0,0)  -  1  and  that  3  =  {(m,n)  : 
b(m,n)  f  0}  is  contained  in  a  lattice  sector  with  center  (0,0)  of  angle  less 
than  n.  These  conditions  guarantee  that  for  a  large  class  of  inputs  equation 
(1)  can  be  solved  by  incrementing  the  values  of  the  indices  (m,n)  In  such  a 
fashion  that  ail  values  of  the  output  can  be  computed  in  turn  from  a  given 
set  of  initial  conditions  [1,2],  In  other  words,  the  stated  conditions  imply 
that  equation  (1)  is  recurs ible  for  many  inputs. 

It  is  our  desire  to  use  equation  (1)  to  Implement  a  linear  shift- 
invariant  (LSI)  system  on  the  Input  !(m,n).  Therefore,  all  initial  conditions 


must  be  zero.  The  impulse  response  of  this  LSI  system  is  defined  by  the 
following  recursive  equation: 

h(m,n)  *  a(m,n)  -  Z  b(r,s)  h(m-r,n-s)  (2) 

(r,s)eB-(0,0) 

in  genera],  the  impulse  response  will  be  nonzero  in  some  shifted  lattice 
sector  of  angle  less  than  n.  In  much  of  the  literature  [3»^*5»6,7]  only  first- 
quadrant  arrays  a(m,n)  and  b(m,n)  are  considered  which  is  just  a  particular 
case  of  the  above  formulation. 

The  following  theorem  summarizes  and  extends  the  above  discussion. 

Theorem  1 :  Let  a(m,n)  and  b(m,n)  be  two  real  finite  extent  arrays  satisfy¬ 
ing  (a)  b(0,0)  =  1,  and  (b)  there  exists  a  (m,n)  t  (0,0)  such  that  b(m,n)  j*  0. 
Then  equation  (1)  is  recursible  if  and  only  if  (iff) 

(1)  B  «  {  (m,n)  :  b(m,n)  i*  0  }  is  a  subset  of  a  lattice  sector  with  center 
(0,0)  of  angle  less  than  n. 

(il)  i(m-r,n-s)  as  a  function  of  (r,s)  does  not  Intersect  B*  (which  is  the 
minimum  angle  lattice  sector  containing  B)  at  an  infinite  number  of 
points  for  all  (m,n). 

If  b(m,n)  satisfies  conditions  (a),  (b) ,  and  (i)  of  the  above  theorem,  then 

it  will  be  called  a  recursive  filter  array.  Associated  with  a  recursive 

& 

filter  array  is  a  lattice  sector  called  B  which  consists  of  all  lattice 

ft 

points  in  the  minimum  angle  sector  containing  B.  B  can  be  uniquely  defined 
by  two  vectors 

B,  -  (Mj .Nj )  e  Z2  and  B2  -  (M2>N2)  e  Z2 

as  B*  *=  SUMpN,)  ,  (H2,N2)] 

2 

-  {  (m,n)  c  Z  :  (m,n)  -  r ^ B|  +  r^*  r\  are  non“ne9at 1 ve  rea'  numbers 
for  i  -  1,  2} 

where  Z  is  the  set  of  integers  and  Hj  and  Nj  along  with  M2  and  N2  are  mutually 
prime  integers. 


109 


For  simplicity,  we  shall  consider  recursive  filters  with  a(m,n)  =  6(m,n) 
(unit  pulse)*  Equations  (1)  and  (2)  now  become 

0(m,n)  ■  i (m,n)  -  E  b(r,s)  0(m-r,n-s)  (3) 

(r,s)e6-(0,0) 

h(m,n)  »  <5(m,n)  -  E  b(r,s)  h(m-r,n-s)  (4) 

(r,s)e£-(0,0) 

There  exists  another  method  of  finding  the  recursive  solution  of  equation 
(4)  which  will  prove  useful  in  the  forthcoming  stability  discussion.  Define 
B(w,z)  for  a  recursive  filter  array  b(m,n)  as 

t 

B(w,z)  *  E  b(r,s)  wrzS.  (5) 

(r,s)e£ 

This  equation  can  be  rewritten  as  B(w,z)  «  1-C(w,z)  where  C(w,z)  is  a  poly** 
nomial  with  no  constant  term.  A  formal  expansion  for  '/B(w,z)  can  be  obtain- 

CD  P 

*/B(w,z)  *  1 ........  *  E  [C(w,z)l  «  E  h(m,n)wmzn  (6) 

1-C(w,z)  n»0 

We  have  proven  in  reference  [8]  that  this  sequence  h(m,n)  is  the  recursive 
solution  of  equation  (4),  that  is,  the  impulse  response  of  the  system.  The 
following  theorem  summarizes  our  findings: 

Theorem  2:  Assume  b(ro,n)  Is  a  recursive  filter  array,  hj (m,n)  Is  the 
solution  of  equation  (4)  and  that  h2(m,n)  Is  the  sequence  gotten  from  equa¬ 
tion  (6);  then  h^m.n)  *  h2(m,n). 

STABILITY  THEOREMS 

In  most  applications  only  bounded-input  bounded-output  (BIBO)  stable 
filters  are  of  interest.  For  a  LSI  system  a  necessary  and  sufficient  condi¬ 
tion  for  BIBO  stability  is  that' E  |h(m,n)|  <  «  .  Therefore,  the  recursive 
filter  represented  by  the  recursive  filter  array  b(m,n)  is  stable  If  and  only 
If  the  impulse  response  obtained  from  equation  (4)  is  absolutely  summable, 


that  is,  I  |h(m,n)|  <  °°  .  If  this  Impulse  response  is  transformed  into 
(m,n)efl* 

another  impulse  response  by  a  mapping  of  the  form 
m"  *  k,nH-k_n 

1  2  (7) 

n"  =  k^m+k^n 

where  kj  e  Z  then  it  Is  still  absolutely  summable.  Furthermore,  if  k^  k^- 
S*  ^3  ^  ®  *  t^ie  resu^ting  sequence  is  Just  the  original  sequence  reordered 
and  thus  if  it  is  stable  then  the  original  Impulse  response  is  stable.  More¬ 
over,  the  resulting  impulse  response  corresponds  to  the  mapped  original  re¬ 
cursive  filter  array  by  g(m,n)  =  b(k^nH-k2n,  k^nrfk^n).  This  fact  follows 
easily  from  equation  (6).  These  observations  are  formalized  in  the  following 
theorem  [8]. 

Theorem  3:  b(m,n)  is  a  stable  recursive  filter  array  Iff  g(m,n)  is  a  stable 
recursive  filter  array  where  g(m,n)  ■  b(k^m+k2n,  kyn+k^n)  where  kjk^-kjkj  ¥  0 
and  k|  cl. 

This  theorem  is  important  because  it  allows  any  class  of  filters  to  be 
mapped  into  any  other  class  while  at  the  same  time  preserving  stability.  In 
particular,  if  b(m,n)  is  a  recursive  filter  array  with  3  »  S[(Mj,Nj)  , 
(^2*^2^  and  ®  =  ^1N2  ~  ^1^2  **  ®  then  t^ie  ^°^ow*n9  mapping  will  change 
b(m,n)  into  a  first-quadrant  filter 

kj  »  sgn  (D)  N2 
k  »  -sgn  (D)  M. 

2  2  (8) 

k^  *>  -sgn  (D)  Nj 

k^  *  sgn  (D)  M} 

Therefore,  tlie  stability  of  a  general  recursive  filter  can  be  determined  by 
testing  the  stability  of  the  first-quadrant  filter  obtained  from  equation  (8), 


Ml 

In  the  literature,  there  exist  a  quarter-plane  stability  testing  procedures 
[3»1*,5,6,7] .  For  example,  any  of  these  methods  can  be  appMed  to  G(W,Z)  In 
the  below  example. 

Example:  B(w,z)  =  .5  w  'z  +  1  +  .85w  +  .lwz  +  .5w2z  *  B*  -  S[(2,-l),  (-1,1)]. 

Equation  (11)  yields  w  -  VC,  z  -  WZ2  to  get  G(W,Z)  -  1  +  .5W  +  .5Z  +  .85WZ  + 

,1W  Z  .  Now  g(m,n)  can  be  shown  to  be  stable;  therefore,  b(m,n)  Is  stable. 

Most  of  the  stability  theorems  of  first-quadrant  filters  relate  stability 

to  the  zero  distribution  of  B(w,z)  «  Z  b(m,n)  \nmzn .  The  following  theorem 

(m,n)eB 

summarizes  these  results. 


Theorem  4:  Let  b(m,n)  be  a  first-quadrant  recursive 

filter  array  then  It 

stable  iff 

0) 

B (w,z)  4  0  | w | 

1  £  1 »  !  z  1  1  1 

I3,*l 

Iff 

(2) 

(a)  B(w,z)  4  0 

|w|  -  1,  |z|  1  1 

[3,91 

(b)  B(w,b)  4  0 

|w|  <_  1  b  a  constant  |b|  £  1 

iff 

(3) 

(a)  B(w,z)  4  0 

on  T2  »  {  (w,z)  :  |w|  ■  1,  |z|  ■ 

n  (7] 

(b)  B(a,z)  4  0 

|z|  i  1  | a J.  «l(|a|  1) 

(c)  B(w,b)  4  0 

| w |  <  l  |b|  <1  (  |b|  -  1) 

iff 

w 

(a)  B(w,z)  4  0 

t2 
on  T 

[7] 

(b)  B(z,z)  4  0 

on  |z|  _<  1 

iff 

(5) 

(a)  B(w,z)  4  0 

T2 

on  T 

[8,10] 

(b)  B^.z*2) 

4  0  |  z  |  <_  1  for  any  £1,^2  e  Z+. 

Utilizing  parts  of  this  theorem  along  with  our 

mapping  theorem  we  can 

obtain  a  stability  theorem  for  general  recursive  filters 
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Theorem  5:  b(m,n)  Is  a  stable  recursive  filter  array  Iff 

(a)  B(w,z)  0  on 

(b)  (I)  If  D  -  MjN2  -  NjH2  ft  0 

B(XPl  XP2)  ft  0  for  | A |  <\ 

for  any  and  p2  where 

P,  =  sgn  (D)  N2  -  sgn  (D)  Nj  ^ 

P2  =  -sgn  (D)  M2  l]  +  sgn  (D)  Mj  l 2 

and  ij  e  Z+ 

(II)  If  D  =  0 

B(XP1,  XP2)  4  0  for  |X|  £  l 
and  pj  =  Mj  ,  P2  *  Nj  *2 

Theorems  (4)  and  (5)  along  with  some  mathematical  results  from  Rudin  [10]  can 
be  utilized  to  prove  the  following  theorem. 

Theorem  6:  Let  b(m,n)  be  a  recursive  filter  array  and  let  B(w,z)  ■  I  b(m,n) 

m  n  (m,n)eB 

w  z  then  the  LSI  system  b(m,n)  represents  Is  stable  Iff 

(a)  B(w,z)  i<  0  on  T^ 

(b)  B(w,l)  and  B(l,z)  have  no  linear  phase  components 

By  a  polynomial  in  z  and  z  '  having  no  linear  phase  components  we  mean  that 
it  has  no  roots  on  unit  circle  and  that  the  phase  function  associated  with 
the  Fourier  transform  of  the  sequence  have  no  linear  term. 

The  conditions  of  this  theorem  are  equivalent  to  those  in  part  (3)  of 
Theorem  k  for  first-quadrant  filters  [7l.  However,  this  theorem  is  much  more 
general  because  It  is  true  for  any  type  of  filter.  Hence,  no  longer  is  it 
necessary  to  check  different  zero  region  conditions  of  B(w,z)  for  each  differ¬ 
ent  type  of  filter  as  done  in  the  past  [3”7l. 
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Conditions  (a)  and  (b)  can  be  restated  respectively  as  (i)  the  Fourier 
transform  of  b(mfn)  Is  never  equal  to  zero,  and  (i!)  there  exist  no  linear 
phase  components  in  the  phase  function.  Dudgeon  [II]  has  shown  that  if  con¬ 
ditions  (i)  and  (ii)  hold  then  the  phase  of  the  Fourier  transform  is  continu¬ 
ous,  odd,  and  periodic.  The  above  observations  suggest  the  following  theorem: 
Theorem  A  recursive  filter  represented  by  a  recursive  filter  array  is 

stable  iff  the  unwrapped  phase  is  continuous,  odd,  and  periodic. 

This  theorem  is  Important  because  it  relates  the  phase  of  a  two-dimension¬ 
al  recursive  filter  to  its  stability.  This  will  aid  in  our  development  of  ef¬ 
ficient,  practical  stability  testing  algorithms. 

PRACTICAL  STABILITY  TESTS 

In  this  section  several  efficient  stability  algorithms  for  general  re¬ 
cursive  filters  will  be  considered.  These  algorithms  test  the  validity  of  the 
conditions  given  in  either  Theorem  (5)  or  (6).  First,  consider  condition  (b) 
of  both  theorems.  Here,  a  one-dimensional  polynomial's  root  distribution  with 
respect  to  the  unit  circle  must  be  determined.  This  Information  can  be  ob¬ 
tained  by  unwrapping  the  phase  of  sequences  which  form  these  polynomials.  A 
very  efficient  and  accurate  phase  unwrapping  method  has  been  recently  devel¬ 
oped  [12].  However,  it  has  been  our  observation  that  for  a  polynomial  of 
order  less  than  two-hundred  or  so,  analytic  methods  such  as  the  Marden-Jury 
table  [13,1*0  test  can  obtain  this  information  more  efficiently. 

If  condition  (b)  of  either  theorem  is  satisfied  then  B(w,z)  must  be 
checked  for  zeros  on  T  .  If  there  exists  a  point  where  the  Fourier  transform 
is  zero  this  implies  that  the  phase  is  discontinuous  at  that  point.  Now,  in 
most  cases  the  existence  of  such  a  discontinuity  can  be  determined  without 
actually  finding  this  point.  Consider  either  for  u,  v  real  numbers 
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Bu(z)  *  E  (  r  b(m,n)  exp(-jum))  zn  (a) 
n  m 

or  (9) 

Bv(w)  ■=  Z  (  E  b(m,n)  exp(-jvn))  wm  (b) 
m  n 

If  there  exists  a  (wc»z0)  “  (exp(juQ),  exp(jvQ))  with  B(wo,zq)  =  0  then  usually 

both  Bu(z)  and  Bv(w)  have  root  distributions  with  respect  to  unit  circle  which 

vary  as  a  function  of  u  and  v  when  viewed  as  polynomials  in  z  and  w  respective* 

1y  near  uq  and  vq.  Thus  if  0  <_  u  <_  n  (or  0  <_  <_  it)  Is  sampled  fine  enough 

so  that  these  discontinuities  can  be  resolved  then  the  existence  of  the  point 

(w  ,z  )  is  substantiated  although  the  values  of  (w  ,z  )  are  not  explicitly 
o  o  o  o 

known . 

The  above  discussion  suggests  the  following  practical  test  for  zeros  on 

2 

T  .  FIrsc,  sample  either  0  <  u  t  or  0  <v  <_  *  or  both  on  a  fine  grid.  The 
intersample  distance  will  determine  the  accuracy  of  the  test.  Next,  evaluate 
the  coefficients  of  Bu (z)  or  B^fw)  for  all  grid  samples.  This  can  be  done 
quite  efficiently  by  employing  the  FFT  algorithm  on  the  rows  of  b(m,n)  to  ob¬ 
tain  the  coefficients  of  Bu(z)  for  U|  ■  or  on  the  columns  of  b(m,n)  to 

obtain  the  coefficients  of  Bv(w).  Next,  some  method  Is  used  to  determine  the 

root  distribution  of  each  B  (z)  or  B  (w)  with  respect  to  the  unit  circle. 

UI  Vi 

For  virtually  all  digital  filter  applications  these  polynomials  will  have 
orders  less  than  a  hundred  so  the  Marden-Jury  table  test  should  be  one  of  the 
most  efficient  methods  for  determining  this.  If,  however,  the  Nyquist  plot 
of  each  polynomial  via  Tribolet's  phase  unwrapping  algorithm  Is  employed  to 
obtain  this  information  then  this  procedure  becomes  a  very  efficient  imple¬ 
mentation  and  generalization  of  DeCarlo's  Nyquist-like  stability  test  [15»1&]. 

i 

The  last  step  of  the  procedure  is  to  check  whether  this  zero  distribution  ever 
changes  as  a  function  of  u  or  v;  if  it  does  vary,  the  filter  is  unstable; 
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If  it  doesn't  change,  the  filter  may  or  may  not  be  stable.  However,  if  a  very 
dense  sampling  grid  was  used  then  we  can  be  "almost  sure"  that  the  filter  is 
stable. 

These  tests  can  be  re- interpret ted  in  terms  of  the  two-dimensional  phase 
function  of  b(m,n).  The  above  methods  determine  whether  the  phase  contains  a 
linear  phase  component  along  a  series  of  either  vertical  (9a)  or  horizontal 
(9b)  lines  [See  figure  1(a)  and  (b)].  This  interpretation  suggests  other 
procedures  which  are  also  motivated  by  Mersereau's  One-Projection  Theorem  and 
the  Projection-Slice  Theorem  [17*18].  Instead  of  approximately  parameterizing 
the  u-v  plane  by  horizontal  or  vertical  lines  we  can  use  slanted  lines  as  de¬ 
picted  in  Fig.  1(c)  and  1(d).  These  correspond  to  projections  of  angle 
tan  '  ('/N)  or  tan  '  (Vm)  where  M  and  N  are  the  length  and  width  of  the  zero- 
padded  b(m,n)  array.  These  projections  can  be  obtained  by  operating  on  the 
one-dimensional  arrays  formed  by  concatenating  the  zero-padded  columns  and 
zero-padded  rows  respectively. 

If  these  one-dimensional  polynomials  contain  a  linear  phase  component  the 

filter  Is  definitely  unstable.  The  accuracy  of  this  test  can  be  increased  by 

* 

adding  more  zeros  to  the  original  array's  columns  or  rows.  Therefore,  for 
large  enough  M  or  N  we  can  be  "almost  sure"  that  a  given  filter  is  stable. 

Figures  2  and  3  summarize  both  the  column  and  rows  tests  and  the  con¬ 
catenation  tests. 

EXAMPLES 

-i  o  -1 

Example:  B^(w,z)  ■  ,5w  z  +  1  +  .85w  +  .lwz  +  ,5w  z 

Condition  7(b)  : 

Bj (w, 1 )  «  .5w  1  +  1  +  ,95w  +  .5w2 

B| (1 ,z)  -  .5z_1  +  1.85  +  .6z 

Both  have  no  linear  phase  components 
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Does  either 

- 

FFT  each  (Iength2^) 

b(m.n)  recursion 

B(w,l)  or  B(l,z) 

no 

row  (column)  of 

filter  array 

have  a  linear  phase 

b(m,n)  and 

component 

store 

Find  zero  distribution 
with  respect  to  unit 
circle  of  each 
column  (row)  in 
above  array 


or 


Increase  FFT  size 
to  desired  accuracy 
limit  and  repeat. 

If  K  is  large  enough 
then  f i I  ter  is  almost 
surely  stable. 


Phase 

Harden- Jury 

unwrapping 

table  (low 

method 

and  moderate 

(high  order 

length 

sequences) 

sequences) 

Efficient  stability  algorithms  for  two-dimensional  recursive  filters 
(row  and  column  algorithms). 


r  • 


r  *  *1 


i  *  ’ 


1 


FI  CURE  2 


t 


yes 


Stability  algorithms  for  two'  Jiwrsioniil  fillers  which  require  only 
ore  one-d imcns iona I  root  distribution  test  (concatenation  algorithms). 


FIGURE  3 
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Condition  7(a);  [row  test] 

Each  row  was  transformed  via  a  1024  length  FFT  and  the  root  distribution  r  #' 

of  the  resulting  columns  revealed  that,  at  least  for  this  accuracy,  Bj(w,z)?*0 
2 

on  T  .  (Since  b(.n,n)  is  real  only  approximately  half  the  number  of  root  dis¬ 
tribution  tests  are  needed).  Therefore,  b(m,n)  is  "most  likely"  a  stable  r  • 

f i 1  ter  array. 

Condition  7(a):  [concatenation  test] 

(»)  Bj(z,z  )  =  .5z  +  .lz  +  1  +  .85z  +  .52  r  •  • 

(Ii)  Bj (w  ,w  )  =  .5w  +  1  +  .lw  +  ,85w  +  .5w 

For  these  to  have  no  linear  phase  terms  the  number  of  roots  inside  the  unit 

circle  for  (i)  should  equal  N+1  and  M+l  for  (il)  where  M  and  N  are  the  lengths  r  •  ' 

of  the  zero  padded  rows  and  columns  respectively.  Below  is  a  table  which  lists 
results  of  using  the  concatenation  algorithm  for  different  values  of  M  and  N. 


The  root 

distributions  were 

determined  via  a 

modified  adaptive  phase  unwrapp- 

• 

ing  algorithm  [24],  Notice 

that  the  zero  distributions  always  corresponds  to 

that  of 

a  stable  filter. 

TABLE  1 

• 

M  or  N 
Value 

Length  of 

M  Sequence 

Length  of 

N  Sequence 

Ideal  Zero 
Distribution 

Ins ide  Uni t  C i rcle 

M  and  N 

Sequence  Zero 
Distribution 

Ah 

12 

37 

27 

13 

13  "  13 

• 

18 

55 

39 

19 

19  -  19 

36 

109 

75 

37 

37  -  37 

72 

217 

147 

73 

73  "  73 

9 

144 

433 

291 

145 

145  -  145 

288 

— 

579 

289 

-  289 

i 
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t : 


>fs 


i.- 


—  i  2—1 

Example:  =  «5w  2  +  1  +  .89w  +  .  Iwz  +  ,5w  z 


B2(w»0  and  82(1, z)  have  no  linear  phase  term.  When  the  row  algorithm  is 
employed  a  phase  discontinuity  occurs  for  FFT  sizes  of  32  or  larger.  There¬ 
fore  b^m.n)  is  definitely  unstable.  Alternately  this  fact  can  be  determined 
by  using  a  concatenation  algorithm. 


Concatenations  of  zero  padded  rows  and  columns  yield  (i)  B-(z,z"^)  = 


.5*-"-'  +  .1*-**'  *  I  *  .8Sz  ♦  . 5zM+2.  (II)  B2(w",„-')  -  .Sw-"-'  «■  I  * 


jyj  2H+1 

+  .89w  +  .5w  .  If  b(m,n)  is  stable  then  the  number  of  zeros  inside  the 


unit  circle  of  the  above  polynomials  should  equal  N+l  and  M+l  respectively  for 
all  values  of  M  and  N.  The  following  table  indicates  results  for  different 
values  of  M  and  N.  A  phase  discontinuity  appears  when  M  ^  18  and  N  >_  1 2  and 


hence,  this  test  also  indicates  that  there  exists  a  (wq,zo)  e  1  such  that 


B(wq,zo)  =  0  so  b(m,n)  is  unstable. 


TABLE  2 


■  1 

) 


H  or  N  Length  of  Length  of  Ideal  Zero  M  and  N 

Value  M  Sequence  N  Sequence  Distribution  S^qu^nce 

Inside  Unit  Circle  Distribution 
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The  row  algorithm  detects  a  phase  discontinuity  for  FFT  sizes  of  128  or  larger 
while  the  column  algorithm  detects  a  discontinuity  for  FFT  sizes  of  32  or 
larger.  The  concatenation  algorithms  indicate  a  phase  discontinuity  for 
B(z,z^)  for  N  =*  1 8  and  for  B(w^,w)  for  M  =  36.  Therefore,  these  algorithms 
indicate  that  b(m,n)  is  unstable. 

If  Ekstrom’s  cepstrum  test  [19]  is  applied  to  this  example  no  conclusion 
can  be  obtained  for  FFT  sizes  as  large  as  61>x64.  Hence,  our  algorithms  are 
far  superior  both  in  computation  time  and  sensitivity  than  the  cepstrum  sta¬ 
bility  test. 

CONCLUDING  REMARKS 

We  have  presented  recursive  filtering  from  a  somewhat  different  point  of 
view.  A  general  mapping  theorem  has  been  formulated  which  allows  any  type  of 
filter  to  be  mapped  into  a  first-quadrant  filter.  This  first-quadrant  filter 
is  stable  if  and  only  if  the  original  filter  was  stable.  Several  general  sta¬ 
bility  theorems  which  relate  stability  to  the  zero  set  of  B(w,z)  have  been 
presented.  These  theorems  led  to  the  conclusion  that  a  filter  is  stable  if 
and  only  if  its  phase  function  is  continuous,  odd,  and  periodic.  This  ob¬ 
servation  suggested  several  practical  stability  testing  algorithms.  Among 
these  are  several  methods  which  appear  to  be  extremely  efficient  for  high 
order  filters.  All  the  results  in  this  paper  can  be  generalized  to  n-dimen- 
sional  filters.  Moreover,  the  practical  stability  tests  can  be  applied  to 
any  finite  two-dimensional  array  to  determine  if  its  cepstrum  exists  by  de¬ 
termining  if  the  Fourier  transform  of  the  array  ever  equals  zero. 


•  < 

* 
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SEGMENTATION  OF  TACTICAL  TARGETS  IN  FUR  IMAGERY 


S.  G.  Carlton  and  0.  R.  Mitchell 

I. INTRODUCTION 

We  have  been  continuing  cooperation  with  Honeywell  System  and  Research 
Division  in  developing  a  system  to  detect  and  recognize  tacticat  targets  in 
FUR  imagery.  Although  the  ultimate  goal  is  to  incorporate  syntactic  methods 
wherever  appropriate  and  useful,  this  report  describes  statistical  segmenta¬ 
tion  methods  and  introduces  a  potentially  useful  classification  method. 

Shown  in  Fig.  1,  upper  left  corner,  and  in  Fig.  3  and  8  are  sample  FUR 
tactical  targets.  These  images  are  thermal  and  several  characteristics  apply 
to  active  vehicles:  (1)  the  motor  is  usually  visible  as  a  hot  spot,  (2)  edges 
can  be  detected  within  and  along  the  target,  and  (3)  the  average  grey  level 
(temperature)  of  the  object  is  often  different  from  the  background.  These 
characteristics  are  presently  used  by  the  Honeywell  Autoscreener  System  to 
locate  potential  target  areas. 

The  techniques  described  here  assume  that  the  images  have  been  pre- 
processed  by  the  Autoscreener  and  the  approximate  location  and  size  of  each 
potential  target  are  known.  We  attempt  a  segmentation  into  target  and  non¬ 
target  pixels  in  the  target  area  and  proceed  to  look  at  a  method  for  classi¬ 
fying  the  segmented  targets. 

II.  SEGMENTATION  FEATURES 

To  accomplish  target  segmentation,  background  statistics  are  collected 
over  an  annular  region  surrounding  the  potential  target.  Then  the  statistics 
of  the  target  region  are  compared  to  those  of  the  background,  and  points  not 
matching  the  background  are  labeled  as  target  points.  As  is  evident  from  the 
sample  images,  grey  level  alone  is  not  always  enough  information  for  accurate 
segmentation,  so  additional  features  are  necessary  for  the  process. 


25 


Hopefully,  once  the  right  features  are  selected  any  points  that  have  different 
features  than  the  surrounding  background  points  will  be  part  of  the  target. 

The  two  additional  features  are  chosen  to  complement  the  grey  level 
image.  These  are  texture  and  edges.  The  texture  was  chosen  because  It  seems 
probable  that  object  and  background  textures  would  not  be  identical,  assuming 
a  good  texture  measure  were  available  to  differentiate  among  textures.  The 
edges  were  chosen  as  a  feature  due  to  the  predominance  of  edges  along  the 
target  background  interface  and  the  fact  that  grey  level  (temperature)  and 
texture  become  ambiguous  near  the  object  boundaries.  The  edge  and  texture 
features  are  shown  in  Fig.  1  at  the  upper  right  and  lower  left,  respectively. 

The  edge  feature  is  a  gradient  type  measurement  measured  over  a  5x5 
window  for  each  point.  The  absolute  difference  between  the  upper  10  points 
and  the  lower  10  points  is  compared  against  the  absolute  difference  between 
the  left  10  points  and  the  right  10  points.  The  center  point  is  then  replaced 
by  the  maximum  of  the  absolute  values  of  these  two  differences.  This  process 
is  repeated  for  each  point  in  the  original  image  to  produce  the  edge  feature 
image.  Figs.  **  and  9  are  also  edge  feature  images  based  on  the  original  images 

in  Figs.  3  and  8,  respectively. 

The  texture  feature  is  derived  from  the  max-min  local  extrema  described 
in  previous  reports  [1-2],  Local  grey  level  extrema  are  measured  in  hysteresis 
smoothed  versions  of  the  original  iamge  using  three  smoothing  thresholds.  The 
lowest  level  extrema  correspond  mostly  to  noise  In  the  image,  whereas  the 
highest  correspond  mostly  to  edges.  The  remaining  medium  level  extrema  are  a 
measure  primarily  of  the  texture  in  the  Image.  These  medium  level  extrema 
locations  are  shown  in  Fig.  2  beside  the  original  tank  Image  from  Fig.  I.  Fig.  5 
shows  the  medium  level  extrema  extracted  from  Fig.  3. 
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The  texture  feature  image  Is  created  from  the  extrema  by  averaging  the 
number  of  medium  level  extrema  in  every  10x10  window  in  the  image  and  re¬ 
placing  the  center  point  of  that  window  with  the  average.  Texture  feature 
images  are  shown  in  Fig.  1  (lower  left),  Fig.  6,  and  Fig.  10. 

111.  SEGMENTATION  PROCEDURES 

Once  the  feature  images  are  produced  two  concentric  circles  are  centered 
at  each  potential  target  as  derived  from  the  Honeywell  preprocessing  system. 
The  Inner  circle  represents  the  potential  target  area  and  the  annular  region 
between  the  two  circles  represents  the  background  region.  In  an  automated 
system  these  circle  sizes  would  be  adaptive  since  approximate  target  size 
and  background  context  will  also  be  available  from  prior  processing  stages. 

In  our  implementation  of  the  system  here,  the  inner  radius  was  fixed  at  30 
pixels  and  the  outer  radius  at  60  pixels.  The  background  annular  region  must 
be  large  enough  to  allow  a  sufficient  background  sample  to  be  collected  but 
It  must  not  include  target  points  or  be  so  large  that  irrelevant  background 
obscures  the  background/target  differences. 

The  present  background  statistics  gathering  program  generates  a  three- 
dimensional  histogram  over  the  original  and  two  features  for  all  background 
points.  The  quantization  selected  allows  for  32  original  grey  levels,  8 
edge  values,  and  16  texture  values.  This  background  histogram  is  therefore 
composed  of  1»096  bins. 

Once  the  background  3-D  histogram  is  completed,  each  potential  target 
point  (3-D  vector)  is  compared  against  its  background  bin.  If  that  feature 
combination  occurs  often  in  the  background,  the  point  is  considered  another 
background  point.  If  the  feature  combination  does  not  occur  in  the  back¬ 
ground,  that  point  is  labeled  a  target  point. 

The  results  of  this  process  are  shown  in  Fig.  1  (lower  right).  The 
target  test  was  done  over  the  whole  Image  instead  of  just  the  inner  circle 
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to  give  us  an  idea  of  background  rejection  of  our  process.  Segmentations 
using  this  process  are  also  shown  in  Figs.  7  and  11.  The  segmentation  results 
using  this  technique  are  very  encouraging  and  the  potential  for  improvement 
is  also  high. 

Present  attempts  at  improvement  of  this  method  include  a  better  texture 
measure  and  better  statistics  data.  The  local  extrema  over  all  smoothing 
thresholds  contain  much  more  texture  information  than  is  presently  retained 
In  our  average  over  all  medium  extrema.  We  are  attempting  to  develop  an 
adaptive,  robust  technique  which  would  measure  those  texture  properties  most 
evident  In  each  image.  The  single  background  histogram  might  be  replaced  by 
looking  at  statistics  over  small  windows  of  the  background.  This  would  allow 
the  background  to  be  represented  by  the  statistics  of  only  a  few  "average" 
windows.  This  would  allow  background  classification  as  well  as  background/ 
target  separation. 

IV.  CLASSIFICATION  BY  PROJECTIONS 

The  segmentations  produced  by  the  method  previously  described  produce 
results  which  are  sometimes  fragmented  and  contain  drop-out  and  extraneous 
points.  A  classification  scheme  which  is  somewhat  insensitive  to  these 
variations  would  be  appropriate.  We  are  presently  investigating  the  use  of 
projections  through  the  segmented  object  to  derive  classification  features. 
This  type  of  structure  recognition  method  is  being  developed  by  New  Mexico 
State  University  for  missile  tracking  at  the  White  Sands  Missile  Range  [ 3 3 • 

It  has  the  advantage  that  the  integration  process  of  the  projections  averages 
out  many  of  the  noise  problems  inherent  in  thermal  images  and  our  classifi¬ 
cation  method. 

Shown  in  Fig.  12  are  eight  projections  through  a  segmented  object  (back¬ 
ground  points  set  to  zero,  target  points  remain  at  their  original  grey  level). 


The  object  is  a  tank  and  is  shown  in  the  middle  of  Fig.  12.  The  small  circles 
along  the  horizontal  axes  represent  102  area  increments  along  the  projections. 
The  numbers  printed  below  are  the  distances  between  the  102  area  increments 
normalized  so  that  the  total  distance  (representing  64  pixels  horizontally) 
in  1000.  Note  that  these  projections  retain  several  characteristic  features 
of  the  tank:  1)  the  motor  hot  spot;  2)  the  cool  region  between  the  motor  and 
front  of  the  tank;  3)  the  cannon  barrel;  and  4)  the  rectangular  shape. 

Shown  in  Fig.  13  are  the  projections  through  another  tank  from  the  same 
aspect  angle  (note  the  similarities).  Fig.  14  shows  projections  through  a 
different  tank  at  a  different  angle  using  a  different  FLIR  sensor.  Although 
shape  information  is  extractable,  this  target  would  have  to  be  learned 
separately  from  the  first  two.  Fig.  15  shows  the  projections  through  a  seg¬ 
mented  APC.  Here  the  differences  between  an  APC  and  a  tank  are  most  evident 
in  the  front  of  the  objects. 

The  eventual  goal  of  this  projection  method  is  to  derive  invariant 
measures  (such  as  ratios  of  the  102  area  increment  distances)  on  several  of 
the  projections  (such  as  narrowest  and  widest)  so  that  objects  of  interest 
can  be  classified. 
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Fig.  3  Sixteen  target  image  blocks 
(128x128)  of  NVL  FUR  data. 

Blocks  3  and  12  do  not  contain 
targets 


Fig.  b  Extracted  edges  from  Fig.  3 


Fig.  7  Segmented  targets  from  Fig. 
using  original,  edges,  and  texture 
as  three  features. 


l'9i  }lPreJ?Ct!?nS  se9mented  ng.  14.  Projections  from  segmented 

tank  (Fig.  11,  block  6)  tank  (Fig.  7,  block  7) 


Fig.  15.  Projections  from  segmented 
APC  (Fig.  7,  block  4) 
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REAL  TIME  TARGET  TRACKING  AT  WHITE  SANDS  MISSILE  RANGE 
0,  R.  Mitchell 

I.  INTRODUCTION 

Research  was  initiated  this  summer  between  Purdue  and  White  Sands  Missile 
Range  to  design  segmentation  and  structure  analysis  algorithms  that  operate 
reliably  on  tracking  imagery.  In  order  to  apply  the  higher  level  information 
extraction  and  symbol  manipulation  methods  developed  under  the  ARPA  grant, 
some  lower  level  techniques  must  be  developed  for  preprocessing,  target  ex¬ 
traction,  and  target  description.  In  order  to  accomplish  these  tasks  a  small 
grant  has  been  obtained  from  the  Army  Research  Office.  Thus,  general  pro¬ 
cedures  developed  under  the  ARPA  grant  may  be  applied  to  this  project  while  a 
dedicated  effort  to  the  tracking  problem  is  sustained  by  the  ARO. 

II.  TRACKING  SYSTEM  REQUIREMENTS 

Many  applications  require  the  precise  location,  orientation,  and  descrip¬ 
tion  of  moving  objects  in  real  time.  Our  work  at  WSMR  this  summer  has  exposed 
us  to  the  special  requirements  of  some  tracking  problems  which  are  best  solved 
by  optical  imaging  tracking  systems. 

Present  optical  tracking  systems  are  usually  "one  concept"  trackers, 
using  either  contrast  or  correlation.  The  contrast  trackers  are  high  speed 
but  lack  the  versatility  to  follow  a  target  in  complex  scenes.  The  template 
matching  or  correlation  trackers  require  a  lot  of  computation  for  reasonable 
window  sizes  and  are  not  easily  adapted  to  changing  aspect  angles,  missile 
staging,  and  other  common  scene  changes. 

The  need  for  a  "smart"  optical  tracker  exists  which  would  operate  in  real 
time  and  could  adapt  to  target  orientation  and  shape  changes  (due  to  staging, 
visible  emissions,  etc.),  and  to  partial  or  total  obscuration  of  the  target  at 
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Presently,  New  Mexico  State  University,  under  a  grant  from  the  Army 
Research  Office,  is  developing  a  prototype  system  with  some  of  the  capabilities 
mentioned.  However,  the  technical  aspects  of  target  segmentation  and  structure 
analysis  need  to  be  studied  in  much  more  detail  in  conjunction  with  the  use  of 
actual  video  data  to  develop  a  system  that  is  robust  under  the  normal  variety 
of  tracking  situations  encountered.  We  are  presently  doing  this  study. 

III.  DATA 

We  helped  develop  a  system  at  WSMR  this  summer  to  put  digitized  video 
data  on  IBM  compatible  format  7“track  magnetic  tape.  We  presently  have  20 
digitized  video  images  representing  some  tracking  situations.  Each  image  is 
one  video  field  consisting  of  240  by  512  points  with  256  grey  levels.  Because 
the  video  interlace  is  not  used,  the  horizontal  resolution  is  twice  that  of 
the  vertical.  The  target  sections  of  sixteen  fields  are  shown  in  Fig.  1.  Each 
field  has  been  trimmed  from  240  by  512  to  64  by  128.  An  estimate  of  the  motion 
involved  can  be  determined  from  the  last  two  targets.  These  images  are  the 
two  fields  from  one  TV  frame,  so  that  the  changes  visible  have  occurred  in  only 
1/30  second. 

Future  research  would  include  digitizing  several  continuous  flights  (e.g., 
60  fields  per  second  for  2  seconds)  and  using  this  data  for  development  and 
testing  of  algorithms. 

IV.  RESEARCH  DIRECTIONS 

Our  research  can  be  divided  into  two  major  areas:  (1)  segmentation  of 
potential  targets  from  the  background,  and  (2)  structure  analysis  of  the  seg¬ 
mented  targets. 

Segmentation  requires  high  speed  processing  of  picture  points  to  identify 
potential  target  areas  and  more  detailed  processing  of  these  areas  to  extract 
a  binary  image  of  the  potential  target. 


Structure  analysis  determines  if  the  object  extracted  is  indeed  the  one 
to  be  tracked,  the  exact  location  and  orientation  of  the  object,  and  recent 
changes  that  have  occured  In  the  object. 

Both  the  segmentation  and  structure  analysis  sections  will  use  informa¬ 
tion  obtained  from  previous  frames  that  is  stored  and  modified  in  real-time 
to  allow  significant  changes  in  the  tracked  object  during  a  mission. 

The  segmentation  of  the  target  from  the  background  is  the  most  critical 
timing  problem  for  real-time  implementation.  At  video  rates,  a  new  picture 
element  (pixel)  arrives  approximately  every  100  nanoseconds.  Thus  either  all 
initial  operations  must  be  simple  or  the  area  to  be  processed  must  be  reduced 
by  prediciting  the  target  location  in  the  iamge  before  processing.  It  is  our 
opinion  that  the  system  would  operate  more  accurately  if  the  entire  picture  is 
processed  every  frame.  This  reduces  the  severe  requirement  of  accurate  pre¬ 
diction  of  target  location  in  the  picture  and  allows  the  system  to  acquire  and 
discriminate  Interfering  objects  before  they  come  confusingly  close  to  the 
target  being  tracked. 

A. Preprocessing 

With  the  large  variety  of  incoming  im-ges,  it  is  often  necessary  to  do 
some  preprocessing  to  make  the  system  more  Invariant  to  changes  in  lighting, 
viewing  angle,  and  resolution.  It  Is  also  useful  to  enhance  those  aspects  of 
the  Image  which  indicate  the  presence  of  targets  (such  as  edges  and  texture  of 
man-made  objects).  We  are  investigating  the  following  preprocessing  operations: 
a  logarithmic  input  transformation,  a  two-dimensional  median  filter,  a  two- 
dimensional  human  visual  system  (HVS)  filter,  and  averaging. 

These  processes  are  window  operations  that  can  be  performed  in  real-time 
over  the  entire  incoming  video  picture.  The  output  of  these  processes  will 
allow  the  selection  of  suitable  potential  target  regions  for  more  detailed 
processing. 
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1.  Logarithmic  Input  Transformation 

Working  with  the  densities  (log  intensities)  allows  the  system  to  be  more 
invariant  to  illumination  and  iens  opening  changes  since  these  produce  multi¬ 
plicative  effects  in  the  intertsity  image  and  additive  effects  in  the  density 
image.  The  logarithm  also  emphasizes  the  detail  in  the  darker  areas  of  the 
image  which  is  especially  useful  to  the  texture  processing  algorithms  described 
later.  The  logarithm  Is  also  an  approximation  to  the  visual  system's  action  on 
an  incoming  pattern.  Our  research  should  indicate  whether  this  non-linear 
operation  is  appropriate  or  necessary  for  typical  video  tracking  data. 

2.  Two-dimensional  Median  Filter 

The  median  filter  allows  for  video  dropouts  including  single  point  or 
single  line  mistakes  in  the  analog  video  chain  or  the  digitizer.  A  simple 
median  filter  replaces  the  center  point  of  each  3x3  window  with  the  median  of 
all  9  points  in  the  window.  This  non-linear  process  eliminates  high  frequency 
noise  but  preserves  monotonic  edges  precisely.  Shown  in  Fig.  2  (upper  left 
corner)  is  the  digitized  video  of  a  cruise  missile  with  the  contrast  greatly 
enhanced.  The  results  of  processing  this  with  a  3x3  median  filter  is  shown  in 
the  upper  right  of  Fig.  2.  Notice  that  the  edges  are  preserved  but  the  noise 
is  reduced  and  the  data  seems  more  correlated.  The  results  of  averaging  the 
median  filtered  picture  over  a  3x5  window  are  shown  in  the  lower  left,  and  in 
the  lower  right  is  the  result  of  averaging  the  orlgtnaJ.  Shown  in  Fig.  3  are 
the  thresholded  versions  of  Fig.  2.  It  appears  that  the  median  filter-averager 
combination  works  best  for  this  data  (the  median  filter  removes  bad  points 
before  they  can  be  smeared  by  the  averager). 

3.  Two-dimensional  Human  Visual  System  Filter 

It  is  well  known  that  the  human  visual  system  (HVS)  responds  to  incoming 
images  with  an  approximate  logarithmic  transformation  followed  by  a  spatial 
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and  temporal  bandpass  filtering  operation.  We  are  investigating  the  usefulness 
of  such  a  spatial  bandpass  filter  in  extracting  objects  for  tracking. 

A  suitable  filter  function  which  is  to  be  convolved  with  an  input  image 
is  shown  in  Table  ?. 


Table  1.  Two-dimensional  HVS  Filter  Function 
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0 

0 

0 

-.25 

0 

0 

0 

0 

0 

0 

0 

-.25 

-.5 

-.25 
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0 
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-.25 

-.25 

+.25 

-.25 

-.25 

0 

0 

0 

-.25 

-.25 

+.5 

+1.0 

+.5 

-.25 

-.25 

0 

.25 

-.5 

+.25 

+1.0 

+1.5 

+1.0 

+  .25 

-.5 

-.25 

0 

-.25 

-.25 

+.5 

+1.0 

+.5 

-.25 

-.25 

0 

0 

0 

-.25 

-.25 

+.25 

-.25 

-.25 

0 

0 

0 

0 

0 

-.25 

-.5 

LA 

CM 

• 

1 

0 

0 

0 

0 

0 

0 

0 

-.25 

0 

0 

0 

0 

The  values  are  chosen  to  make  computation  of  the  fractional  values  easy  for 
real-time  implementation.  Because  our  video  data  has  twice  the  resolution  in 
the  horizontal  direction  than  in  the  vertical  (only  one  video  field  is  pro¬ 
cessed  at  a  time),  the  HVS  filter  Is  modified  as  shown  in  Table  2. 

Table  2.  Modified  HVS  Filter  to  Allow  Half  Resolution 
in  Vertical  Direction 


0 

0 

0 

-.25 

-.75 

-.25 

0 

0 

0 

0 

-.25 

-.5 

+.25 

+1.25 

+  .25 

-.5 

-.25 
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-.25 

-.5 

+.25 

+  1.0 
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+  1.0 

+.25 

-.5 

-.25 

0 

-.25 

-.5 

+.25 

+1.25 

+  .25 
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-.25 

0 

0 

0 

0 

-.25 

-.75 

-.25 

0 

0 

0 
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for  a 

small  obj 
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which  just  covers  the  plus  area.  The  filter  shown  In  Table  2  was  used  to  pro¬ 
duce  Fig.  5  using  Fig.  h  as  the  Input.  Note  that  edges  are  emphasized  and 
that  the  missile  (above  the  plume)  Is  enhanced. 

Our  research  should  indicate  the  appropriateness  of  such  a  filter  and  the 
parameters  best  suited  to  video  tracking  data. 

B.  Region  Growing  Using  Window  Statistics 

After  preprocessing  the  input  video  data,  potential  target  regions  must  be 
extracted.  Depending  on  the  data,  this  task  may  be  a  simple  thresholding 
operation  or  it  may  be  a  more  complex  extraction  procedure.  We  choose  to 
formulate  the  segmentation  in  terms  of  measuring  statistics  over  each  nxm 
window  in  the  preprocessed  picture  and  assigning  each  window  to  one  of  several 
classes. 

The  measurements  which  can  be  made  on  a  nxm  window  range  from  a  simple 
histogram  of  the  data  points  to  a  representation  in  (n) (m) -dimensional  vector 
space.  Of  course,  the  simplest  representation  that  will  give  a  proper  seg¬ 
mentation  is  the  most  desirable.  This  selection  can  only  be  made  after  ex¬ 
perience  is  gained  by  segmenting  a  large  amount  of  typical  tracking  data. 

Once  a  window  parameter  is  selected,  points  must  be  clustered  to  give 
appropriate  regions.  This  clustering  must  be  based  upon  some  distance  measure 
between  two  window  parameters.  We  are  investigating  the  following  window 
parameters,  distance  measures,  and  clustering  techniques: 

1.  First  Order  Measure 

We  are  using  the  histogram  of  the  points  In  each  window  as  the  window 
parameter.  The  distance  measure  will  be  the  Bayes  error  overlap  between  the 
two  modified  histograms: 


d  ■  1  -  /  min  [f (x)  *  hj (x) ,  f (x)  *  h2(x)]  dx  (1 

where  h^  (x)  and  h^Cx)  are  the  grey  level 
histograms  over  the  two  windows  and 
f(x)  Is  a  smoothing  and  normalizing  function 
to  be  convolved  with  each  histogram 

The  smoothing  function  is  chosen  so  that  the  modified  histogram  has  an  area 
of  unity  and  is  smoothed  so  that  the  distance  measured  by  Eq.  (I)  is  not  ad¬ 
versely  affected  by  the  discrete  nature  of  the  original  histograms. 

An  example  is  shown  in  Figs.  6  and  7.  Then  +'s  in  Fig.  6  represent  the 
histogram  from  window  1  and  the  0's  represent  the  histogram  from  window  2. 

The  functions  p^ (x)  and  p^(x)  in  Fig.  7  represent  the  smoothed  histograms. 

The  shaded  region  is  the  Integral  in  Eq.  (1).  This  area  is  equivalent  to  the 
probability  of  error  in  estimating  from  which  of  these  density  function  esti¬ 
mates  comes  a  sample  grey  level. 

The  distance  measure  is  now  used  to  cluster  windows  that  are  close  to¬ 
gether  to  form  segmented  regions.  Prior  frame  information  should  be  useful 
here  to  provide  initial  cluster  groupings.  Windows  that  are  more  than  a 
threshold  distance  away  from  all  known  clusters  will  be  used  to  form  new 
clusters. 

2.  Second  Order  Measure 

In  tracking  situations  where  first  order  measures  do  not  allow  proper 
segmentation,  a  second  order  measure  may  be  applied.  We  will  use  a  modified 
joint  histogram  relating  pairs  of  points  within  a  window: 

p(x,y)  *  f(x,y)  *  h(x,y)  (2 

where  h(x,y)  Is  the  joint  histogram  of  pairs  of  adjacent 
points  (co-occurrence  matrix)  and  f(x,y)  is  a  two-dimensional 
smoothing  and  normalizing  function. 
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The  co-occurrence  matrix  has  been  applied  to  texture  analysis  and  has 
shown  encouraging  results.  However,  instead  of  calculating  parameters  on  the 
matrix  as  is  normally  done,  we  will  use  the  matrix  itself  in  a  two-dimensional 
version  of  the  overlap  error  measurement  of  distance, 

d  -  1  -  //  min  [p, (x,y) ,p2 (x,y) ]  dxdy  (3) 

Higher  order  measures  are  probably  impractical  but  could  be  investigated 
If  these  initial  measurements  show  obvious  shortcomings. 

C.  Local  Extrema  Information 

The  detection  of  local  grey  level  extrema  in  an  image  has  shown  to  be 
useful  in  texture  classification  and  image  segmentation  [1].  The  detection 
algorithm  for  this  operation  is  as  follows: 

In  a  nxm  window,  compare  the  grey  level  of  the  center  point  with  those 
of  its  two  vertical  neighbors.  If  it  is  above  noth  neighbors,  the  center 
point  Is  a  local  maximum  in  the  vertical  direction.  If  this  is  the  case, 
compare  the  center  point  with  each  point  along  each  vertical  direction  unit 
a  grey  level  is  encountered  which  is  above  the  center  point's  value  or  until 
the  edge  of  the  window  is  encountered.  The  alrgest  differences  between  grey 
levels  in  each  vertical  direction  are  then  compared  and  the  smallest  of  the 
two  is  retained  as  the  size  of  the  local  maximum  In  the  vertical  direction. 

An  example  Is  shown  in  Table  3  for  a  5x7  window. 

Table  3.  Sample  Grey  Levels  for  Extrema  Detection 


36 

40 

47 

30 

24 

33 

34 

30 

32 

36 

36 

40 

32 

40 

30 

42 

46 

45 

43 

35 

36 

40 

33 

47 

32 

34 

42 

50 

42 

40 

30 

30 

20 

45 

36 
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The  center  value  is  45.  The  47  ends  the  search  in  the  top  direction  and  the 
50  ends  the  search  in  the  bottom  direction.  The  range  is  15  above  and  12 
below.  Therefore  the  center  point  is  a  local  maximum  in  the  vertical  direc¬ 
tion  of  size  12.  If  the  point  is  a  local  minimum  instead  of  a  maximum,  the 
process  is  done  in  the  same  way  interchanging  the  above  and  below  comparison 
tests. 

This  process  is  also  done  in  the  horizontal  direction.  In  the  example 
of  Table  3,  the  center  point  is  not  a  local  extreme  in  the  horizontal 
direction.  If  a  point  is  a  local  extreme  in  both  horizontal  and  vertical 
directions,  only  the  largest  of  the  two  is  retianed  at  that  location.  The 
extrema  detection  process  is  equivalent  to  local  maximum  and  minimum  de¬ 
termination  following  hysteresis  smoothing  of  various  amounts. 

Figure  8  shows  the  output  of  such  an  operation  on  the  image  in  Fig.  5. 
Extreme  size  is  indicated  by  the  grey  level.  No  distinciton  is  made  in  the 
figure  between  maxima  and  minima  or  between  horizontal  or  vertical  extrema. 

Note  that  the  edges  emphasized  by  the  HVS  filter  now  are  marked  by  ex¬ 
trema.  The  missile  orientation  can  be  extracted  from  its  edge  information, 
however,  the  algorithm  must  be  high-level  enough  so  that  the  missile  edge  Is 
extracted  and  not  the  plume.  (The  missile  orientation  angle  may  not  be  the 
same  as  its  flight  direction.) 

Also  the  texture  of  various  regions  can  be  characterized  by  the  types 
and  number  of  extrema  present.  The  region  characterization  may  be  useful  for 
background  classification  and  for  plume  Identification.  Parameters  for  this 
measurement  are  extracted  by  counting  the  number  of  various  size  extrema 
within  a  window  surrounding  each  point. 


D.  Edge  Information 

The  human  visual  system  is  very  sensitive  to  edges  and  lines  [2]. 

Patterns  containing  edges  are  much  more  visible  than  those  conta-ning  the  same 
signal  power  but  lacking  well-defined  edges.  Also,  many  objects  to  be  tracked 
are  distinguished  from  natural  environment  background  by  the  presence  of 
edges.  It  would  seem  appropriate  that  one  tool  which  should  be  included  in 
the  tracker  repertoire  is  the  ability  to  detect  the  use  edge  information. 

Fig.  9  shows  edge  intensity  information  extracted  from  the  original  target  in 
Fig.  1.  A  gradient  operator  over  a  3x5  window  was  used. 

We  propose  to  include  edge  information  as  follows:  when  local  extrema 
occur  at  points  with  large  edge  values,  they  will  be  considered  edge  extrema. 
These  edge  extrema  can  then  be  used  as  follows:  (1)  the  presence  of  an  edge 
will  enhance  the  possibility  of  a  region  being  listed  as  a  potential  target 
area,  and  (2)  windows  which  contain  edges  will  be  modified  so  that  edge  points 
control  the  window  shape.  Consider  Fig.  10  as  an  example.  In  this  figure, 
detected  edge  points  are  marked  with  an  "En.  To  perform  a  window  operation, 
scanning  starts  at  the  center  point  and  moves  in  the  numbered  directions, 
stopping  at  an  edge  point  or  at  the  window  edge.  This  allows  for  non-rectangu- 
lar  windows  along  the  boundary  of  targets. 

E.  Extraction  of  Binary  Images 

The  purpose  of  all  preprocessing  and  segmentation  techniques  described  in 
prior  sections  Is  to  produce  binary  images  which  are  potential  targets  for 
tracking.  Information  to  be  used  in  the  selection  of  binary  images  will  be: 

(1)  segmented  regions  from  section  B, 

(2)  texture  information  from  section  C, 

(3)  edge  information  from  section  D, 

(k)  prior  frame  Information. 
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Our  Initial  protocol  for  potential  target  selection  is  shown  in  the  flow 
chart  in  Fig.  II. 

F.  Structure  Analysis  of  Binary  Images 

The  purpose  of  structure  analysis  is  to  find  structure  differences  among 
various  targets  such  as  airplanes,  helicopters,  missiles,  and  balloons.  The 
structure  analyzer  can  operate  on  the  segmented  image  from  the  previous  sections 
We  are  presently  investigating  the  potential  of  several  structure  analysis 
methods  and  will  report  on  these  results  at  a  later  time. 
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Fig.  3  Threshold  versions  of  images 
in  Fig.  2. 


Fig.  1  Sample  targets  from  TV 
video  each  block  is  6^x128, 
expanded  to  128x128  (1  field  only) 
The  first  five  targets  are  air¬ 
planes,  the  next  three  are  cruise 
missiles,  the  last  eight  are 
missiles. 


Fig.  2  Cruise  missile  preprocess 
ing.  Original  (upper  left), 
median  filtered  (upper  right), 
averaged,  median  filtered 
(lower  left),  averaged  only 
(lower  right) 


Fig.  k  Missile  original,  6**xl28 


Fig.  5  Spatial  bandpass  filter 
on  Fig.  it 


Fig.  10.  Example  of  edge  points  controlling 
window  shape. 
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A  SPECIAL  COMPUTER  ARCHITECTURE  FOR  IMAGE  PROCESSING 
K.  S.  Fu  and  Janmln  Keng 

Image  processing  by  computer  encompasses  a  wide  variety  of  techniques 
and  mathematical  tools.  In  most  image  processing,  large  computers  have  been 
employed.  Unfortunately,  the  cost  is  high.  Image  processing  tasks  usually 
involve  an  extremely  large  volume  of  data,  much  of  which  can  be  operated  on 
in  parallel.  Therefore,  it  is  important  to  study  special  computer  architec¬ 
ture  for  image  processing.  During  the  last  two  decades  the  field  of  image 
processing  has  grown  up  rapidly.  New  techniques,  algorithms,  and  applications 
have  been  developed,  but  there  is  still  a  need  for  improved  hardware.  A 
special  computer  architecture  is  presented  here  as  a  proposal  for  improving 
the  state  of  the  art  in  image  processing  and  also  to  cut  costs.  Designing 
this  computer  architecture  was  a  challenging  problem,  as  the  desire  was  to 
build  a  computer  that  would  have  the  following  features: 

1)  The  computer  was  to  allow  efficient  Image  processing  at  high  speed 
utilizing  interactive  computation  and  making  possible  large  data 
eval uation. 

2)  The  computer  was  to  preserve  the  general  purpose  aspects  of  a  general 
purpose  computer. 

3)  The  computer  was  to  be  cost  effective  in  order  to  allow  industrial 
real ization. 

The  framework  of  this  proposed  computer  architecture  consists  of  a  task 
management  processor,  a  parallel  processor,  and  a  sequential  arithmetic  pro¬ 
cessor.  The  task  management  processor  is  a  set  of  software  system  programs 
serving  as  an  operating  system.  The  parallel  processor  is  proposed  because 
of  the  parallel  nature  of  the  operations  involved  in  image  processing.  Para¬ 
llel  machines  are  considered  to  be  particularly  suitable  for  image  processing 
by  Thurber  and  Wald  [73.  The  parallel  processor  of  the  proposed  computer. 
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architecture  consists  of  an  array  of  microprocessors  which  allow  parallel 
processing  capability.  This  parallel  processor  is  a  homogeneous  system  in 
which  all  processors  are  alike  and  are  genera)  purpose  units.  The  homogeneous 
parallel  processor  lends  Itself  quite  readily  to  extendabi 1 1 ty  as  such  systems 
are  usually  modularly  constructed.  With  modular  parallel  processing,  the 
system's  memory,  processor,  and  input/output  modules  may  be  enlarged  as  pro¬ 
cessing  requirements  increase;  thereby  avoiding  replacement  of  the  entire 
parallel  processing  system.  The  modular  parallel  processor  also  provides 
very  high  reliability  since,  with  several  identical  modules  of  each  types, 
the  system  can  withstand  failures  in  several  modules  and  still  operate.  This 
arrangement  also  increases  efficiency  and  through-put  since  all  of  the  pro¬ 
cessors  could  be  operating  s imul taneous ly.  The  sequential  arithmetic  pro¬ 
cessor  is  a  microprogrammed  controlled  processor  which  perforins  the  sequen¬ 
tial  arithmetics  and  also  controls  the  input/output  devices.  The  details  of 
these  processors  will  be  set  forth  in  section  2.  It  will  be  shown  that  the 
design  goals  were  achieved  through  the  proposed  computer  architecture. 

PREVIOUS  WORK  AND  COMMENTS 

The  previous  work  in  the  field  of  special  purpose  computer  architecture 
for  image  processing  basically  falls  into  two  categories:  bit-plane  process¬ 
ing  and  distributed  processing.  The  bit-plane  processing  approach  performs 
the  arithmetic  computation  on  the  image  points  which  are  stored  in  Boolean 
bit-plane.  For  example.  If  the  image  has  eight  grey  levels,  then  the  image 
points  are  stored  in  three  Boolean  planes.  The  bit-plane  processing  approach 
tends  to  have  a  large  number  of  processors  which  perform  Boolean  operations. 
The  distributed  computing  approach  utilizes  processors  which  have  powerful 
computation  capability.  These  processors  are  designated  by  microprogram  or 
hardware  to  execute  certain  specific  tasks.  Thus,  this  configuration  forms 
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a  distributed  computing  architecture.  A  detailed  illustration  of  the  previous 
special  purpose  computers  for  image  processing  Is  presented  in  our  work  [I]. 
The  comparison  of  those  computers  is  also  discussed  in  [1]. 

Comparing  the  bit-plane  processing  approach  and  the  distributed  computing 
approach,  the  bit-plane  approach  mostly  uses  Boolean  operators  as  processors. 
Boolean  operators  work  only  on  binary  images  which  are  not  common  in  the  real 
world.  One  way  to  get  around  this  is  to  use  several  binary  picture  planes  to 
represent  the  grey  scale  values  of  picture  poihts.  But,  the  complex  software 
and  additional  memory  requirement  cause  another  problem  and  this  problem 
limits  the  processing  power  of  the  processor.  One  of  the  drawbacks  of  the 
bit-plane  processing  approach  Is  that  the  processing  power  of  the  processor 
may  not  be  adequate  for  some  of  the  more  sophisticated  image  processing  tech¬ 
niques.  For  example,  the  parallel  Picture  Processing  Machine  PPM  (now  called 
PICAP)  has  been  shown  in  Kruse  [8]  to  be  applicable  only  to  the  preprocessing 
part  of  fingerprint  classification  and  as  stated  above  syntactic  techniques 
have  to  be  performed  by  the  conventional  computer  In  the  PICAP  system.  Thus, 
the  feasibility  of  the  bit-plane  approach  to  perform  highly  sophisticated,  but 
important,  techniques  is,  at  this  time,  unsure.  The  capability  of  real-time 
processing  of  the  bit-plane  processing  computers  such  as  CLIP  3*  CLIP4,  and 
PPM  is  very  difficult  to  ascertain  until  more  complicated  techniques  have 
been  implemented  by  these  computers  using  pictures  with  more  grey  levels  such 
as  128  or  256. 

After  studying  the  feasibilities  of  the  bit-plane  processing  and  dis¬ 
tributed  computing  approaches  to  the  real  world  Image  processing  task,  we  feel 
that  the  distributed  computing  approach  Is  more  feasible  considering  the  pre¬ 
sent  state  of  the  art  with  respect  to  both  software  and  hardware.  In  March 
1977 •  Stone  [6]  Indicated  that  the  distributed  computing  approach  is  one  of 


the  future  trends  for  general  computer  architecture.  His  remark  supports  our 
judgement  on  the  distributed  computing  approach  for  special  computer  archi¬ 
tecture  for  image  processing.  A  major  drawback  of  previous  computers  de¬ 
signed  by  distributed  computing  approach  is  that  the  processor  systems  are  not 
reconf igurabie.  The  vast  varieties  of  sensor  types,  applications,  and  image 
processing  techniques,  require  that  the  image  processing  system  (especially 
the  parallel  processor)  be  reconf igurabie.  Therefore,  a  generalized  computer 
architecture  which  is  reconf igurabie  under  software  control  is  proposed  in  the 
next  section  for  the  many  applications  of  image  processing.  Not  only  is  the 
concept  of  reconf igure-abi 1 i ty  new  for  special  purpose  computer  architectures 
for  image  processing,  but  also  the  methods  of  exploitation  of  parallelism  is 
new.  In  the  proposed  computer  arch i tecture  parai lei  ism  within  the  task  is 
exploited  by  the  parallel  processor.  In  the  meantime  the  operations  of  the 
sequential  arithmetic  processor  -re  pipelined  with  the  parallel  processor 
under  programmer  control  in  certain  tasks  which  can  be  decomposed  into  pipe¬ 
lined  processing.  Therefore,  parallelism  and  pipelining  are  exploited  at  the 
same  time  in  the  proposed  computer  architecture.  The  parallelism  is  referred 
to  the  multiprocessing  approach  which  subdivides  each  outcoming  job  among 
many  identically  constructed  mechanisms.  The  piping,  or  overlap  is  referred 
to  another  multiprocessing  approach  which  is  to  develop  a  collection  of  spec¬ 
ialized  mechanisms  capable  of  working  simultaneously  to  form  a  general  purpose 
organization.  The  processing  time  of  the  image  processing  task  by  the  pro¬ 
posed  computer  architecture  will  be  sped  up  by  a  factor  which  is  comparable 
to  the  amount  of  parallelism  and  pipelining  existing  in  the  image  processing 


task  of  interest 


PHYSICAL  ORGANIZATION  AND  CONTROL  FLOW  OF  THE  PROPOSED  COMPUTER  ARCHITECTURE 


The  proposed  Computer  Architecture  for  Image  Processing  is  called  CAIP 
and  is  designed  using  the  most  recent  semiconductor  technology.  The  physical 
organization  and  control  flow  are  as  follows: 

Physical  Organization  of  the  Proposed  Computer  Architecture 

The  physical  organization  of  special  computer  architecture  for  image 
processing  (CAIP)  comprises  the  task  management  processor,  the  control  units, 
the  parallel  processor,  the  sequential  arithmetic  processor,  and  the  memory 
organization. 

1)  Task  Management  Processor  (TMP)  is  a  set  of  software  programs  which  al¬ 
locate  the  jobs  to  the  parallel  processor  (PP)  or  Sequential  Arithmetic 
Processor  (SAP).  The  set  of  software  programs  include  a  task  control  program, 
a  job  control  program,  an  input/output  program,  and  the  language  translation 
program.  The  task  control  program  provides  the  logical  interface  between  the 
hardware  and  the  remainder  of  the  software  system  and  is  responsible  for  the 
allocation  of  Jobs  to  the  parallel  processor  and  sequential  arithmetic  pro¬ 
cessor.  Each  task  has  a  tag  which  is  designated  by  the  programmer  for  the 
Identification  of  parallel  processing  or  sequential  processing.  One  part  of 
the  task  control  program  is  called  the  tag  examination  program  which  examines 
the  tags  on  tasks  and  allocates  the  tasks  to  the  proper  processor.  Following 
the  tag  examination,  the  initiation  program,  which  is  another  part  of  the  task 
control  program,  initiates  the  parallel  processor  or  the  sequential  arithmetic 
processor.  In  general,  the  task  control  program  performs  scheduling,  super¬ 
vision,  interruption  handling,  execution  supervision,  and  clock  supervision. 
The  job  control  program  provides  a  logical  interface  between  a  task  and  a  job 
or  between  a  task  and  the  system  operator.  The  job  control  program  analyzes 
the  job  stream,  looks  at  system  resources,  processes  job  execution  and 
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termination,  and  communicates  between  the  system  operator  and  the  Individual 
job  program.  The  I/O  control  program  provides  an  interface  between  the 
processing  programs  and  the  I/O  devices.  The  I/O  control  program  performs  I/O 
supervision,  access  routine  processing,  and  I/O  device  initiation.  The 
language  translator  program  translates  the  computer  language  into  machine  codes 
and  compiles  the  program  to  be  executed. 

r 

2)  The  Control  Unit  (CU)  consists  of  two  sets  of  software  programs.  One 
control  unit  (CUPP)  is  for  the  parallel  processor,  and  the  other  (CUSA)  is  for 
the  sequential  arithmetic  processor.  The  CUPP  and  CUSA  are  different  from 
conventional  processors,  in  that  they  are  software  programs  which  control  the 
oDeration  of  the  parallel  processor  and  the  sequential  arithmetic  processor 
respectively.  The  CUPP  is  a  control  program  which  initiates  two  different 
sets  of  software  operating  systems,  one  for  SIMD  mode  and  the  other  for  MIMD 
mode.  The  two  sets  of  software  operating  systems  drive  the  parallel  processor 
individually  upon  the  command  of  CUPP.  The  reconfiguration  from  SIMD  mode  to 
MIMD  mode  or  MIMD  mode  to  SIMD  mode  are  performed  by  loading  the  operating  sys¬ 
tem  corresponding  to  the  desired  mode.  Next,  the  operating  system  is  assigned 
to  the  parallel  processor  (PP)  by  the  control  program  of  CUPP.  Hence,  the 
parallel  processor  operates  in  either  SIMD  or  MIMD  modes  under  the  respective 
operating  systems.  Through  this  arrangement,  the  CUPP  reconfigurates  the 
computer  architecture  from  SIMD  to  MIMD  or  MIMD  to  SIMD,  This  reconf igurable 
capability  enables  this  computer  architecture  to  satisfy  the  large  variety  of 
applications  of  image  processing.  The  operating  system  of  MIMD  mode  includes 
scheduling  routines,  dynamic  allocating  routines,  and  dispatching  routines. 

The  scheduling  routines  schedule  each  job  depending  on  job  priority  and  facil¬ 
ity  requirements.  The  dynamic  allocating  routines  take  jobs  set  up  by  the 
scheduling  routines  and  partition  the  set  of  processors  according  to  the  needs 


of  each  job.  The  dispatching  routine  dispatches  the  processors  when  the  job 
terminates  or  some  higher  priority  task  requires  processors.  The  operating 
system  for  the  SIMD  mode  is  on  the  master  control  unit.  All  the  processors 
of  the  parallel  processor  (PP)  are  controlled  by  this  master  control  unit  and 
thereby  an  instruction  is  executed  simultaneously  on  all  the  processors.  The 
control  unit  of  the  sequential  arithmetic  processor  (CUSA)  controls  the  se¬ 
quential  arithmetic  processor  which  is  a  microprogram-controlled  Bipolar  pro 
cessor.  The  control  units  are  shown  in  Fig.  1.  Note  that  Figs.  1,  2,  3,  and 
Fig.  4  form  a  graphical  illustration  of  this  computer  architecture  (CAIP)  by 
linking  the  corresponding  symbols  a,  B,  y,  o  in  the  figures. 

3)  The  Parallel  Processor  (PP)  is  an  array  of  microprocessors.  For  the  para¬ 
llel  processor,  N  microprocessors  are  connected  in  an  array  fashion.  The  array 
organization  is  especially  suitable  for  image  processing  [73 .  The  number  N  is 
determined  from  the  tradeoff  considerations  between  performance  and  cost.  The 
optimal  number,  N,  varies  with  the  task.  Hence,  N  can  only  be  determined  at 
the  time  of  implementation.  In  the  framework  shown  in  Fig.  2,  a  set  of  64 
microprocessors  is  used  to  give  an  idea  of  the  dimension  of  the  problem.  The 
control  unit  (CUPP)  controls  this  set  of  microprocessors  in  SIMD  or  MIMD  modes. 
This  control  unit  enables  the  parallel  processor  (PP)  to  have  a  higher  degree 
of  flexibility  and  processing  power.  The  SIMD  mode  utilizes  a  single  master 
control  unit  which  drives  the  multiple  processing  units  (microprocessors),  all 
of  which  either  execute  or  ignore  the  current  Instruction.  This  SIMD  mode  is 
especially  useful  for  the  cases  in  which  there  exists  (1)  a  large  amount  of 
independent  data,  (2)  no  restrictions  preventing  them  from  being  processed  in 
parallel,  (3)  a  requirement  for  high  throughput,  and  (4)  a  possibility  of  ex¬ 
ploiting  the  associate  addressing  selection  technique.  Thus,  SIMD  mode  is 
suitable  for  the  local  task,  which  executes  the  same  instruction  on  each 
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Figure  1.  Control  units  of  designed  computer  architecture 
for  image  processing. 
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picture  element  within  an  image  window.  The  MIMD  mode  utilizes  N  processors 
and  N  memories  where  each  processor  follows  an  independent  instruction  stream. 
The  parallel  processor  (PP)  is  connected  by  crossbar  switches  to  an  inter¬ 
leaved  memory  system  which  divides  the  ordinary  memory  into  modules  and  the 
consecutive  data  are  stored  in  different  modules.  The  interleaved  memory  sys¬ 
tem  is  used  because  the  bandwidth  is  greater  than  a  conventional  memory  sys¬ 
tem  and  can  be  multiply-accessed  which  is  more  appropriate  for  parallel  pro¬ 
cessing  than  a  conventional  system,  in  which  data  can  only  be  accessed  one-at- 
a  time. 

k)  The  Sequential  Arithmetic  Processor  (SAP)  is  a  microprogram-controlled 
processor.  Mini-and  micro-computers  are  not  used  here  because  user  micro- 
programmable  capability  and  Bipolar  processor  are  not  furnished  by  the  usual 
mini-  or  micro-computers.  The  Sequential  Arithmetic  Processor  (SAP)  is  a 
Bipolar  processor  which  is  a  processor  built  by  Bipolar  semiconductor  tech¬ 
nology  and  usually  has  the  bit  slicing  capability.  The  Bipolar  processor 
permits  the  designer  to  define  his  own  instruction  set  and  the  associated 
hardware  architecture  to  achieve  special  capabilities,  such  as,  variable  word 
length  capability,  or  to  perform  an  application  with  the  highest  efficiency. 
The  Bipolar  processor  expands  the  CPU  word  length  by  cascading  the  needed 
number  of  bit-slice  microprocessor  components.  This  variable  word  length 
capability  of  SAP  makes  it  more  general  and  powerful  than  other  micro¬ 
processors.  Along  with  this  processor,  a  microprogram  memory  is  needed  to 
store  the  microprograms  of  certain  programs  frequently  in  use.  For  the 
microprograms  of  Image  processing  techniques,  no  Instruction  fetching  is  re¬ 
quired  because  of  the  coding  of  the  microprogram.  The  microprogram  capabil¬ 
ity  saves  processing  time  according  to  the  ratio  of  instruction  fetch  time 
to  total  execution  time.  The  Sequential  Arithmetic  Processor  Is  shown  in 
Fig.  3. 


Figure  3 


Sequential  arithmetic  processor  of 
designed  computer  architecture. 
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5)  The  Memory  Hierarchy  Organization  is  the  memory  system  for  SAP.  The 
picture  is  stored  on  a  magnetic  disk  memory  which  is  more  economical  than  core 
memory,  but  memory  access  time  is  long.  The  memory  access  time  of  the  Bipolar 
memory  is  faster  by  a  factor  of  100  than  disk  memory  [9].  Therefore,  a  semi¬ 
conductor  Bipolar  memory  is  connected  to  the  disk  memory  as  working  memory 
space  and  buffer.  The  hierarchy  organization  is  as  follows:  the  picture  area 
which  is  to  be  processed  is  loaded  onto  the  Bipolar  memory  from  the  disk  memory, 
then  the  processor  gets  the  data  (picture  points)  from  the  Bipolar  memory  thus 
allowing  extremely  fast  memory  access  time.  In  order  to  avoid  being  delayed  by 
the  loading  time  from  disk  to  Bipolar  memory,  a  Bipolar  Memory  Buffer  (BMB)  is 
used.  While  the  processor  is  reading  the  data  into  the  Bipolar  working  memory 
space,  the  next  picture  area  is  loaded  onto  the  Bipolar  Memory  Buffer  (BMB). 
Thus,  this  memory  preloading  makes  the  data  always  ready  in  the  fast-access 
Bipolar  memory.  This  memory  hierarchy  organization  is  illustrated  in  Fig.  4. 

The  proposed  memory  hierarchy  loads  a  large  block  of  data  onto  the  Bipolar 
Buffer  Memory  as  image  processing  necessitates  operations  on  large  blocks  of 
data,  this  ability  of  the  CAIP  through  its  BMB  provides  a  marked  advantage  for 
image  processing.  Once  the  large  block  of  data  has  been  loaded  onto  the 
Bipolar  memory,  any  individual  data  point  can  be  randomly  accessed  and  in¬ 
dividual  loading  from  primary  to  secondary  storage  is  not  needed.  Thus,  the 
user's  software  becomes  simpler  by  virtue  of  this  proposed  memory  hierarchy 
organization.  The  Bipolar  memory  is  used  here  because  the  Bipolar  memory  is 
the  memory  device  with  the  fast  fetch-time  in  present  technology  [9].  There 
is  only  one  disadvantage  to  the  use  of  Bipolar  memory:  It  is  more  expensive 
than  core  memory.  Instead  of  using  Bipolar  memory,  core  memory  can  be  used 
in  the  proposed  memory  hierarchy  if  the  cost  is  a  great  constraint  to  the 
user. 
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Data  out  from  Memory 


Memory  hierarchy  of  designed  computer  architecture. 
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Control  Flow  of  the  Proposed  Computer  Architecture 


The  control  flow  for  the  proposed  computer  architecture,  CAIP,  is  shown 
in  Fig.  5.  The  Task  Management  Processor  (TMP)  allocates  jobs  to  the  parallel 
processor  (l*P)  or  the  sequential  arithmetic  processor  (SAP)  by  means  of  the 
tag  examination  routine.  The  user's  program  provides  a  flag  to  the  parallel 
processor  which  indicates  whether  the  task  is  local  or  global.  The  locality 
tester  examines  the  flag  and  initiates  the  SIMD  or  MIMD  mode.  The  SIMD  mode 
is  appropriate  for  local  tasks.  In  this  mode  a  single  instruction  is  executed 
simultaneously  on  the  image  points.  The  local  task  means  that  the  instruction 
is  executed  on  individual  data  within  an  image  window,  such  as  calculating 
the  histogram  of  an  image  window.  For  global  task,  the  MIMD  mode  is  employed. 
The  g’obal  task  means  that  part  of  the  task  is  performing  one  kind  of  opera¬ 
tion  and  other  part  of  the  task  is  performing  another  kind  of  operation.  For 
example.  In  the  task  of  evaluating  textured  and  non textured  areas ,  some  pro¬ 
cessors  perform  second  order  statistical  texture  analysis  on  certain  window 
and  some  processors  perform  first  order  mean  vector  analysis  on  the  cor¬ 
responding  windows.  This  global  task  comprises  of  two  different  natures  of 
subtasks.  The  outputs  from  the  Sequential  Arithmetic  Processor  (SAP)  com¬ 
municates  with  the  parallel  processor.  Therefore,  the  SAP  may  support  the  PP 
and  vice  versa.  Parallelism  of  task  is  exploited  by  the  parallel  processor 
(PP)  to  obtain  high  speed  performance.  In  the  meantime  the  operations  of  the 
sequential  arithmetic  processor  (SAP)  are  pipelined  to  the  parallel  processor 
under  program  control  in  certain  tasks  which  can  be  decomposed  into  pipelined 
processing.  Therefore,  the  control  flow  of  this  architecture  exhibits  both 
parallelism  and  pipelining  .  This  arrangement  has  not  been  incorporated  into 
any  of  the  previously  proposed  systems.  Since  two  types  of  multi¬ 
processing  parallel  processing  and  pipeline  processing,  are  exploited 
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CONTROL  FLOW  OF  OESIGNED  COMPUTER  ARCHITECTURE 


Figure  5.  Control  flow  of  designed  computer  architecture 
for  image  processing,  array  processing. 
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simultaneously,  this  contributes  to  high  speed  performance  in  the  proposed 
computer  architecture. 

ANALYSIS 

Performance  and  Cost-Effectiveness  Tradeoffs 

The  parallel  processor  (PP)  of  the  proposed  computer  architecture  con¬ 
sists  of  an  array  of  microprocessors  which  are  the  processing  elements.  In 
determining  the  optimal  number,  N,  of  microprocessors  for  the  parallel  pro¬ 
cessor  (PP) ,  a  systematic  procedure  is  needed.  Such  a  procedure  follows: 

In  designing  the  parallel  processor,  as  the  number  of  processing  elements 
(microprocessors)  increases,  the  number  of  data  points  processed  by  each  pro¬ 
cessor  decreases  and  processing  speed  increases,  but  the  scheduling  overhead 
also  increases.  Therefore,  the  processing  speed  improvement  reaches  a  sat¬ 
uration  point  at  a  certain  number  of  processing  elements.  So  it  seems  that 
the  number  of  processing  elements  corresponding  to  the  saturation  point  would 
be  a  good  choice.  However,  the  answer  is  not  that  simple,  as  cost-effective¬ 
ness  is  an  important  factor  in  the  feasibility  of  a  computer  architecture. 

Thus,  the  costs,  such  as  hardware  and  software  costs  of  the  parallel  processor 
(PP) ,  need  to  be  considered.  Hardware  cost  usually  involves  the  hardware  pur¬ 
chased.  Software  cost  refers  to  development  of  the  operating  systems  and  soft¬ 
ware  supports.  The  hardware  cost  increases  with  the  increase  In  the  number  of 
processing  elements  (microprocessors).  The  software  cost  increases  more  rapid¬ 
ly  than  the  hardware  cost  as  the  number  of  processing  elements  increases.  The 
best  choice  of  optimal  number  of  processing  elements  is  obtained  by  evaluating 
the  performance  improvement  and  cost  increment  on  different  image  processing 
tasks.  The  optimal  number  directly  depends  on  the  specifications  which  are 
given  by  the  user.  For  example,  if  the  performance  curve  becomes  saturated  at 
80  processors  for  task  A,  and  70  for  task  B  shown  in  Fig.  6,  N  should  be  in 
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between  70  and  80.  The  hardware  cost  increases  more  or  less  linearly  with  the 
number  of  processors  (if  fewer  than  100  processors  are  bought).  But,  as  stated 
above,  the  software  cost  Increases  more  rapidly  than  the  hardware  cost.  If  the 
software  cost  increases  sharply  at  60  processors,  as  in  Fig.  7,  the  optimal 
number  of  processors,  considering  cost  and  performance  trade-off,  is  between 
60  and  80.  Depending  on  the  specifications  of  the  user,  if  the  concern  is 
more  for  performance  than  cost,  then  a  number  near  80  is  chosen.  If  the  user 
is  more  concerned  about  cost,  then  a  number  near  60  should  be  chosen. 
Implementation  of  Statistical  Methods 

During  recent  years,  a  number  of  image  processing  algorithms  have  been 
developed  [2,3].  In  this  section,  some  image  processing  techniques  are  dis¬ 
cussed  in  terms  of  the  proposed  computer  architecture,  CAIP,  to  exemplify  the 
operations  of  the  special  computer  for  image  processing. 

Statistical  texture  analysis  has  been  an  important  topic  in  the  field  of 
image  processing.  The  texture  analysis  technique  in  t^]  consists  of  histogram 
equalization,  and  texture  feature  measurement.  In  applying  the  proposed  com¬ 
puter  architecture,  CAIP,  to  such  a  texture  analysis  technique,  the  task  of 
texture  analysis  is  assigned  a  tag  P  (which  stands  for  parallel  processor  task) 
by  the  user's  program.  (Tag  S  stands  for  sequential  arithmetic  processor  task). 
The  subtasks  of  texture  analysis,  such  as  histogram  equalization  and  texture 
feature  measurement,  are  designated  by  the  flags  SI  (which  denotes  the  SIMD 
mode  for  the  task),  and  HI  (which  denotes  the  MIMD  mode  for  the  task).  Assum¬ 
ing  the  size  of  the  image  is  MxM.  The  texture  analysis  task  is  allocated  to 
the  parallel  processor  (PP)  of  the  CAIP  by  the  tag  examination  routine  of  the 
Task  Management  Processor  (TMP).  The  Control  Unit  of  the  Parallel  Processor 
(CUPP)  finds  the  flag,  SI,  for  histogram  equalization  and  loads  the  operating 
system  of  the  SIMO  mode.  Each  processor  of  the  N  microprocessors  then 
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M  M 

calculates  the  histogram  of  an  -  x  -  picture  window.  The  outputs  of  the 

/FT  /FT 

N  processors  are  then  put  together  to  obtain  the  equalization  result  of  the 
final  histogram.  The  CUPP  keeps  the  parallel  processor  (PP)  in  the  SIMD  mode 
after  examining  the  falg  SI  of  the  texture  feature  measu  ent  task.  For 
example,  if  the  88x88  picture  discussed  in  [4]  is  to  be  processed,  each  pro¬ 
cessor  will  process  the  co-occurrence  matrix  of  the  window  of  llxll  pixels. 

The  variability  texture  feature  measurement  is  calculated  by  each  processor 
as  the  texture  value  for  the  center  cell  (4x4)  of  that  window.  The  mapping 
of  the  array  of  processors  to  the  image  points  is  shifted  four  pixels  and 
repeats  the  texture  feature  measurement  task.  When  this  shift  reaches  the 
right  edge  of  image,  the  mapping  is  shifted  four  pixels  downward  and  the  pro¬ 
cess  is  repeated  from  the  left  most  column  of  the  image.  This  process  con¬ 
tinues  until  the  texture  values  of  all  the  picture  points  are  obtained. 
Syntactic  Methods  and  Parallel  Processing 

As  has  been  pointed  out  previously,  syntactic  methods  for  image  process¬ 
ing  have  increased  in  importance  for  certain  applications.  Previous  special 
computers  such  as  the  PPM  and  the  PI  CAP  are  unable  to  process  syntactic 
methods  by  parallel  processing  [8],  However,  the  proposed  CAIP  furnishes  the 
capability  of  parallel  processing  of  syntactic  algorithm.  In  using  the  para¬ 
llel  processor  for  syntactic  methods,  parallel  parsing  schemes  are  most  de¬ 
sirable  as  they  can  utilize  the  capability  of  the  parallel  processor.  Un¬ 
fortunately,  the  research  in  parallel  parsing  schemes  is  very  limited.  In  this 
section,  we  introduce  the  parallel  parsing  of  tree  languages  and  explore  the 
parallel  parsing  of  the  parallel  context-free  languages  [5]. 

1)  Tree  Languages  and  Parallel  Processing.  The  parallel  processor  (PP)  of 
the  proposed  computer  architecture  for  image  processing  can  be  applied  to  tree 
grammar  parsing.  The  parallel  parsing  procedure  of  a  tree  grammar  is  described 
below.  The  task  of  tree  grammar  parsing  is  designated  by  a  flag  P  and  tag  SI 
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by  the  user's  program  for  the  effective  utilization  of  the  facilities.  Through 
the  control  unit  of  the  computer  architecture,  the  parallel  processor  is  put 
into  the  SIMD  made  for  the  task  of  tree  grammar  parsing.  If  a  production  rule 
of  the  tree  grammar  which  is  applied  to  parse  the  language  has  k  branches, 
then  each  of  these  k  branches  has  a  nonterminal.  Each  processor  of  the  para¬ 
llel  processor  (PP)  is  assigned  by  the  user  to  parse  one  nonterminal.  This 
procedure  is  applied  to  consecutive  parsing  of  the  language  until  a  final 
parsing  result  is  achieved.  If  the  parsing  is  successful,  then  the  language 
(pattern)  is  accepted.  If  the  parse  fails  then  the  language  (pattern)  is 
rej ected. 

For  example,  the  tree  grammar  is  =  (V,  r,  P,  S),  where  V  *  {S,  a,  b, 

$,  A,  B},  VT  =  Oa,  +b,  $},  r(a)  =  (2,  1,  0}r(b)  =  (2,  1,  0},  r($)=  {2  }  and 
P:  [2] 


rule  1: 
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This  grammar  generates  such  patterns  as 
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In  order  to  perform  parallel  parsing  of  the  tree  language,  processors  need  to 
be  assigned.  First,  the  depth  "d"  of  the  tree  Is  defined  as  the  number  of  the 
levels  cf  the  tree.  The  depth  of  the  tree  in  (1)  Is  2  (d=2)  and  the  depth  of 
the  tree  in  (2)  is  4  (d=4) .  The  maximum  number  of  branches  for  all  the  tree 
grammar  rules  is  easily  obtained  by  checking  the  values  of  r  in  the  grammar 
=  (V,  r,  P,  S)  and  this  number  is  called  m.  The  relationship  between  m 
and  r  is  that  m  is  the  maximum  of  the  values  of  the  r's.  The  number  of  needed 
processors  in  the  parallel  processor  (PP)  is  (d)m.  This  procedure  is  per¬ 
formed  for  the  worst  case  protection  concept  which  allows  the  maximum  number 
of  branching  in  each  level  of  the  tree  parsing.  If  the  number  (d)m  is  greater 
than  the  number  of  processors  of  the  parallel  processor  (PP),  there  are  two 
solutions:  One  is  the  static  priority  procedure,  and  the  other  is  the  dynamic 
priority  procedure.  The  static  priority  procedure  has  a  fixed  priority  rule 
for  assigning  the  available  processors  to  the  proper  subtasks.  The  fixed  pri¬ 
ority  rule  for  the  parallel  tree  parsing  scheme  is  to  assign  the  k  left  most 
branches  the  equal  priority  in  each  parsing  stage.  The  number,  k,  is  the 
number  of  available  processors  of  the  parallel  processor  (PP)  for  each  parsing 
stage.  At  parsing  stage  one,  the  k  left  most  branches  are  parsed  first,  based 
on  the  highest  priority  rule.  At  parsing  stage  two,  the  k  leftmost  branches 
(nonterminals)  of  stage  two  then  have  the  highest  priority  to  be  parsed.  Thus, 
the  k  available  processors  are  assigned  to  parse  these  nonterminals. 
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In  the  dynamic  priority  procedure,  a  dynamic  priority  rule  hrs  to  be 
established  at  each  stage  of  parsing  in  order  to  determine  which  subtasks  have 
the  highest  priority.  For  example,  the  k  left  most  priority  could  be  assigned 
first,  then  the  k  rightmost  priority  assigned  next  at  the  request  of  the  user. 
However,  in  parsing  the  tree  languages,  all  the  individual  branches  (non¬ 
terminals)  have  to  be  parsed  to  get  the  nodes  (terminals),  therefore,  the 
static  priority  procedure  is  better.  The  dynamic  priority  procedure  would 
only  be  used  in  special  cases,  such  as  the  case  in  which  only  a  partial  par¬ 
sing  result  is  of  interest. 

Using  the  example  given  above  to  illustrate  the  proposed  parallel  par¬ 
sing  scheme  for  tree  languages  and  to  compare  the  parsing  result  with  con¬ 
ventional  parsing  scheme  for  tree  languages,  if  the  depth  of  the  tree  to  be 

parsed  is,  at  most,  four  -  the  maximum  number  of  processors  needed  is  easily 

2 

calculated  to  be  (4)  =16.  In  the  parallel  parsing  of  the  input  tree  a. 
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The  task  is  allocated  to  the  parallel  processor  (PP)  by  the  P  flag  found  by 
the  Task  Management  Processor  (TMP)  and  the  parallel  processor  (PP)  is  placed 
in  the  SIMD  made  by  CUPP.  The  procedure  is  graphically  illustrated  in  Fig.  8. 
At  the  parsing  of  depth  one,  one  processor  parses  the  language  by  the  rule  1. 
At  the  parsing  of  depth  two,  two  processors  are  assigned.  The  number  of  pro¬ 
cessors  needed  for  the  parsing  of  depth  k  is  automatically  determined  by  the 
number  of  branches  obtained  from  the  parsing  of  depth  k-1.  The  number  of 
branches  obtained  from  depth  one  in  our  example  is  two.  Thus,  two  processors 
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Figure  8.  Parallel  tree  parsing  procedure. 


are  needed,  one  for  the  parsing  of  the  branch  starting  from  nonterminal  A  and 
the  other  for  the  parsing  of  the  branch  starting  from  nonterminal  B.  Grammar 
rules  2  and  3  are  simultaneously  applied  by  the  two  processors  to  parse  the 
two  branches  of  the  tree  a  starting  from  A  and  B  respectively.  At  the  parsing 
of  depth  three,  four  processors  are  needed.  Two  processors  simultaneously 
parse  the  branches  of  A  and  B  which  are  the  result  of  parsing  rule  2  in  the 
depth  two.  The  parsing  rules  for  these  two  processors  are  rules  h  and  3, 
respectively,  to  nonterminal  A  and  B.  The  other  two  processors  simultaneously 
apply  rules  6  and  5  to  parse  the  branches  of  A  and  B,  which  are  the  result  of 
parsing  by  rule  3  in  the  depth  two.  At  the  parsing  of  depth  four,  since  there 
are  only  two  nonterminals  A  and  B  left  from  the  parsing  of  depth  three,  two 
processors  are  assigned  to  simultaneously  parse  the  branches  A  and  B  by  rules 
i*  and  5  respectively.  Thus,  the  parallel  parsing  is  completed  and  the  tree 
is  accepted.  The  parsing  of  the  same  tree  language  by  the  conventional  se¬ 
quential  parsing  scheme  is  shown  in  Fig.  9.  At  each  parsing  stage,  only  one 
nonterminal  can  be  parsed  by  the  parser.  At  parsing  stage  one,  grammar  rule  1 
is  applied.  Rule  2  is  applied  at  parsing  stage  two.  Then  the  rules  *»,3,A,5,3, 
6,  and  5  are  applied  to  parsing  stages  3»**»5,6,7,8,  and  9  respectively.  Nine 
parsing  stages  are  needed  for  the  parsing  of  the  same  tree  as  shown  in  Fig.  9. 
It  can  be  seen  in  Fig.  8  that  only  four  parsing  stages  were  needed  for  parallel 
tree  parsing  scheme.  Thus,  in  this  example  there  Is  a  saving  of  over  50X  in 
parsing  stages,  and,  therefore,  a  corresponding  saving  in  time  by  utilizing 
this  parallel  parsing  scheme  on  the  proposed  computer. 

2)  Parallel  Context-Free  Language  and  Parallel  Processing.  The  parallel  context- 
free  language  was  defined  by  Siromoney  and  Krithlvason  in  [5l.  The  definition 
of  a  parallel  context-free  language  is  a  language  generated  by  a  context-free 
grammar  In  whclh  the  manner  of  applying  the  grammar  rules  is  restricted  as 
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follows:  if  a  nonterminal  occurs  more  than  once  in  a  sentential  form,  then 
every  occurrence  of  the  nonterminal  is  replaced  at  the  same  time  by  the  same 
rule. 

The  parallel  processor  (PP)  of  the  proposed  computer  architecture  for 
image  processing  (CAIP)  performs  the  parsing  of  a  parallel  context-free 
language  in  the  S I MD  mode.  From  the  definition  of  a  parallel  context-free 
language,  the  number  of  needed  processors  of  the  parallel  processor  is  equal 
to  the  maximum  number  of  occurrences  of  a  nonterminal  in  all  sentential  forms. 

A  processor  is  assigned  ur.Jer  programmer  control  to  each  nonterminal  when  it 
occurs  simultaneously  with  the  same  nonterminal  in  the  derivation.  These 
processors  perform  the  parsing  of  same  grammar  rules  on  these  nonterminals. 

For  example,  the  task  of  parsing  a  parallel  context-free  language  is 
assigned  the  flag  P  and  tag  SI  which  initiate  the  SIMD  mode  for  the  task. 

The  parallel  context-free  language  is  L (G)  =  {  a^  |  n  0}  [5].  The  parallel 
context-free  grammar  is  G  =  (V,  I,  P,$)  where  V  =  {  S,  a},  I  ■  (a),  and 
P  {  S  SS,  S  -*■  a}.  If  the  languages  to  be  parsed  are  aa  and  aaaa.  The  max¬ 
imum  number  of  occurrences  of  nonterminals  in  all  sentential  forms  is  four 
which  comes  from  aaaa  (nonterminal  SSSS).  Thus,  the  number  of  needed  pro¬ 
cessors  is  four.  At  the  first  stage  of  parsing  of  the  language  aaaa,  one 
processor  is  assigned  to  parse  the  language  and  the  grammatical  rule  is  S  ■*  SS . 
At  the  second  stage  of  parsing,  two  processors  apply  the  same  rule  S  +  SS  on 
the  two  nonterminals  "S"  and  the  parsing  result  is  SSSS.  At  the  third  stage 
of  parsing,  four  processors  apply  the  same  rule  S  -*•  a  on  all  the  four  non¬ 
terminals  "S".  Hence,  the  parsing  result  is  aaaa.  This  language  cannot  be 
parsed  sequentially  as  the  language  is  defined  to  be  parsed  only  parallelly. 

The  sentence  aaaa  of  this  example  is  parsed  by  the  parallel  context-free 
grammar  G.  Thus,  the  sentence  is  accepted  as  a  member  of  L(G). 
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SUMMARY  AND  REMARKS 

A  computer  architecture  for  image  processing  (CAIP)  has  been  proposed. 

This  computer  architecture  is  designed  by  the  distributed  computing  approach. 
This  computer  comprised  of  a  parallel  processor  (PP)  and  a  sequential  arith¬ 
metic  processor  (SAP).  The  reconfiguration  capability  of  the  CAIP  provides 
the  flexibility  of  this  computer  architecture  and  the  method  of  exploitation 
of  task  parallelism  gives  the  high  performance  of  this  computer  architecture. 

This  computer  architecture  for  image  processing  is  proposed  to  use  micro¬ 
processors  as  the  processing  elements.  The  advantage  of  using  a  microprocessor 
array  is  that  the  cost  of  microprocessors  is  much  lower  than  that  of  con¬ 
ventional  processors.  The  disadvantage  is  that  the  processing  power  of  micro¬ 
processors  is  less  than  that  of  conventional  processors,  especially  in  addres¬ 
sing  capability.  For  example,  the  most  popular  microprocessors  INTEL  8080 
and  MOTOROLA  6800  do  not  have  associate  addressing  or  microprogramming  abili¬ 
ties.  For  special  image  processing  computers,  some  processing  powers  of  con¬ 
ventional  processors,  such  as  associate  addressing,  are  not  essential  [7l. 

The  approach  of  designing  a  general  purpose  computer  by  microprocessors  has 
been  controversial.  But,  in  designing  a  special  purpose  computer  for  image 
processing,  microprocessors  have  their  advantages.  Furthermore,  the  advance 
in  semiconductor  technology  is  toward  the  development  of  microprocessors  with 
higher  processing  power.  The  recent  developments  in  the  microprocessors 
series  INTEL  3000,  MOTOROLA  M2900,  and  Texas  Instruments  71*5i*8l  have  provided 
microprogramming  capability  to  microprocessors. 

With  the  fast  growth  of  image  processing  and  its  applications,  the  need 
for  a  special  image  processing  machine  such  as  the  proposed  computer,  CAIP, 
should  certainly  be  appreciated. 
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